Olea europaea var. sylvestris Assembly and Gene Annotation
About Olea europaea var. sylvestris
Olea europaea var. sylvestris (wild olive, oleaster, acebuche) is a small evergreen tree native to the Mediterranean basin which is considered an ancestor of cultivated olive trees. It is a diploid species (2n=2x=46) with an estimated genome size of 3.19±0.047pg/2C DNA.
Assembly
DNA was extracted from leaves collected from trees in the Orhangazi region of Bursa city (Turkey). The genome was shotgun-sequenced (220x coverage), generating 515.7 Gbp of data. SOAPdenovo was used to assemble the sequence reads, which resulted in a draft genome assembly of 1.48 Gbp, with the scaffold shortest sequence length at 50% of the genome of 228 kbp, which is in agreement with genome size estimations from flow cytometry and k-mer analysis (∼1.46 Gbp). By using genetic maps with 1,307 markers, 50% of sequences longer than 1 kbp (∼572 Mbp) could be anchored into 23 linkage groups.
Annotation
Homology-based and de novo methods, as well as RNA-seq data, were used to predict genes. GLEAN was used to consolidate results. Protein sequences of several plants were aligned with TBLASTN and genBLASTA against the matching genomic sequence by using GeneWise for accurate spliced alignments. Next, the de novo gene-prediction methods GlimmerHMM and Augustus were used to predict protein-coding genes, with parameters trained for O. europaea var. sylvestris, A. thaliana, S. indicum, S. tuberosum, and V. vinifera. A total of 50,684 protein-coding genes were predicted, of which 47,124 genes (93%) were confirmed by transcriptome data. A total 31,245 genes were located on the anchored chromosomes.
Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 1619023 Low complexity (Dust) features, covering 160 Mb (14.0% of the genome); 1281455 RepeatMasker features (with the REdat library), covering 261 Mb (22.9% of the genome); 8004 RepeatMasker features (with the RepBase library), covering 1 Mb (0.1% of the genome); 513948 Tandem repeats (TRF) features, covering 374 Mb (32.8% of the genome); Repeat Detector repeats length 516Mb (45.2% of the genome).
References
- Genome of wild olive and the evolution of oil
biosynthesis.
Unver T, Wu Z, Sterck L, Turktas M, Lohaus R, Li Z, Yang M, He L, Deng T, Escalante FJ et al. 2017. PNAS. 114 (44):E9413-E9422. - Nuclear DNA content estimations in wild olive (Olea europaea L.
ssp. europaea var. sylvestris Brot.) and Portuguese cultivars of O.
europaea using flow
cytometry.
J Loureiro, E Rodriguez, A Costa et al. . 2007. Genet Resour Crop Evol. 54:21-25.
Picture credit: http://www.juntadeandalucia.es
Links
More information
General information about this species can be found in Wikipedia.
Statistics
Summary
Assembly | O_europaea_v1, INSDC Assembly GCA_002742605.1, |
Database version | 113.1 |
Golden Path Length | 1,140,989,389 |
Genebuild by | ORCAE |
Genebuild method | External annotation import |
Data source | International Olive Genome Consortium |
Gene counts
Coding genes | 50,681 |
Gene transcripts | 50,681 |