Olea europaea subsp. europaea varieties
*Olea europaea* var. *sylvestris* (wild olive, oleaster, acebuche) is a small evergreen tree native to the Mediterranean basin which is considered an ancestor of cultivated olive trees. DNA was extracted from leaves collected from trees in the Orhangazi region of Bursa city (Turkey). The genome was shotgun-sequenced (220x coverage) and SOAPdenovo used to assemble the reads, which resulted in a draft genome assembly of 1.48 Gbp, which is in agreement with genome size estimations from flow cytometry and k-mer analysis. By using genetic maps with 1,307 markers, 50% of sequences longer than 1 kbp (∼572 Mbp) could be anchored into 23 linkage groups.
Homology-based and de novo methods, as well as RNA-seq data, were used to predict genes. GLEAN was used to consolidate results. Protein sequences of several plants were aligned with TBLASTN and genBLASTA against the matching genomic sequence by using GeneWise for accurate spliced alignments. Next, the de novo gene-prediction methods GlimmerHMM and Augustus were used to predict protein-coding genes, with parameters trained for O. europaea var. sylvestris, Arabidopsis thaliana, Sesamum indicum, Solanum tuberosum and Vitis vinifera. Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 1619023 Low complexity (Dust) features, covering 160 Mb (14.0% of the genome); 1281455 RepeatMasker features (with the REdat library), covering 261 Mb (22.9% of the genome); 8004 RepeatMasker features (with the RepBase library), covering 1 Mb (0.1% of the genome); 513948 Tandem repeats (TRF) features, covering 374 Mb (32.8% of the genome); Repeated sequences called with the Repeat Detector cover 45.2% of the genome.