Quercus suber (CorkOak1.0)

Quercus suber Assembly and Gene Annotation

About Quercus suber

Quercus suber is an evergreen tree, commonly known as cork oak, which is native to the western Mediterranean Basin, especially southwest Europe, where it occurs in the coastal regions. Cork oak has the rare characteristic of producing a continuous and renewable cork layer, which has fine physical and chemical properties that make it highly profitable for industrial uses.

Assembly

The genome size of cork oak, a diploid (2n=24) species, was estimated, using flow cytometry, to be 934 Mb. In the present study we a combination of Paired-End (PE) and Mate-Pair (MP) libraries sequenced were used using the Illumina platform to generate a draft genome assembly with an estimated genome size of 953.3 Mb, which is a very close match to the previous estimate. The bioinformatics pipeline involved a de novo genome assembly step, followed by scaffolding, gap filling and removal of heterozygous regions. The cork oak draft genome is distributed over 23,344 scaffolds, even though the vast majority of the assembly is represented in a considerable smaller number of larger scaffolds (approximately 94.6% of the assembled genome present in the 4,730 scaffolds longer than 10,000 bp).

Annotation

The structural annotation of the genome yielded 79,752 genes, with complete open reading frames, and 83,814 transcripts. The number of transcripts with a valid functional annotation varied with the database used, and the maximum number was 69,218, when searching against InterPro signatures, which represented 82.6% of the total. Finally, using a validation approach based on the RNA-Seq data available for five cork oak tissues, a total of 33,658 predicted genes could be confirmed and classified as high confidence genes, since they presented assembled transcripts within the genome annotation coordinates.

  • The draft genome sequence of cork oak.
    Ramos AM, Ramos AM, Usié A, Barbosa P, Barbosa P, Barros PM, Capote T, Chaves I, Simões F, Abreu I, Carrasquinho I, Faro C, Guimarães JB, Mendonça D, Nóbrega F, Rodrigues L, Saibo NJM, Varela MC, Egas C, Matos J, Miguel CM, Oliveira MM, Ricardo CP, Gonçalves S.. Sci Data 5

Picture credit: Wikipedia

Statistics

Summary

AssemblyCorkOak1.0, INSDC Assembly GCA_002906115.4, Dec 2018
Database version111.2
Golden Path Length953,298,670
Genebuild byARRAY(0xb78f388)
Genebuild methodExternal annotation import
Data sourceGenosuber

Gene counts

Coding genes52,739
Non coding genes367
Misc non coding genes367
Pseudogenes87
Gene transcripts57,046