Olea europaea var. sylvestris (O_europaea_v1)

Olea europaea var. sylvestris Assembly and Gene Annotation

The olive genome consortium was coordinated by the iBG (Izmir, Turkey). The genome was sequenced by the BGI (China) and analysed by iBG, Córdoba University (Spain) and Ghent University (Belgium).

About Olea europaea var. sylvestris

Olea europaea var. sylvestris (wild olive, oleaster, acebuche) is a small evergreen tree native to the Mediterranean basin which is considered an ancestor of cultivated olive trees. It is a diploid species (2n=2x=46) with an estimated genome size of 3.19±0.047pg/2C DNA.

Assembly

DNA was extracted from leaves collected from trees in the Orhangazi region of Bursa city (Turkey). The genome was shotgun-sequenced (220x coverage), generating 515.7 Gbp of data. SOAPdenovo was used to assemble the sequence reads, which resulted in a draft genome assembly of 1.48 Gbp, with the scaffold shortest sequence length at 50% of the genome of 228 kbp, which is in agreement with genome size estimations from flow cytometry and k-mer analysis (∼1.46 Gbp). By using genetic maps with 1,307 markers, 50% of sequences longer than 1 kbp (∼572 Mbp) could be anchored into 23 linkage groups.

Annotation

Homology-based and de novo methods, as well as RNA-seq data, were used to predict genes. GLEAN was used to consolidate results. Protein sequences of several plants were aligned with TBLASTN and genBLASTA against the matching genomic sequence by using GeneWise for accurate spliced alignments. Next, the de novo gene-prediction methods GlimmerHMM and Augustus were used to predict protein-coding genes, with parameters trained for O. europaea var. sylvestris, A. thaliana, S. indicum, S. tuberosum, and V. vinifera. A total of 50,684 protein-coding genes were predicted, of which 47,124 genes (93%) were confirmed by transcriptome data. A total 31,245 genes were located on the anchored chromosomes.

Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 1619023 Low complexity (Dust) features, covering 160 Mb (14.0% of the genome); 1281455 RepeatMasker features (with the REdat library), covering 261 Mb (22.9% of the genome); 8004 RepeatMasker features (with the RepBase library), covering 1 Mb (0.1% of the genome); 513948 Tandem repeats (TRF) features, covering 374 Mb (32.8% of the genome); Repeat Detector repeats length 516Mb (45.2% of the genome).

References

  1. Genome of wild olive and the evolution of oil biosynthesis.
    Unver T, Wu Z, Sterck L, Turktas M, Lohaus R, Li Z, Yang M, He L, Deng T, Escalante FJ et al. 2017. PNAS. 114 (44):E9413-E9422.
  2. Nuclear DNA content estimations in wild olive (Olea europaea L. ssp. europaea var. sylvestris Brot.) and Portuguese cultivars of O. europaea using flow cytometry.
    J Loureiro, E Rodriguez, A Costa et al. . 2007. Genet Resour Crop Evol. 54:21-25.

Picture credit: http://www.juntadeandalucia.es

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyO_europaea_v1, INSDC Assembly GCA_002742605.1,
Database version111.1
Golden Path Length1,140,989,389
Genebuild byORCAE
Genebuild methodExternal annotation import
Data sourceInternational Olive Genome Consortium

Gene counts

Coding genes50,681
Gene transcripts50,681