Corymbia citriodora (Ccitriodora_v2_1)

Corymbia citriodora Assembly and Gene Annotation

About Corymbia citriodora

Corymbia citriodora is a member of the predominantly Southern Hemisphere Myrtaceae family, which includes the eucalypts (Eucalyptus, Corymbia and Angophora; ~800 species). Corymbia is grown for timber, pulp and paper, and essential oils in Australia, South Africa, Asia, and Brazil, maintaining a high-growth rate under marginal conditions due to drought, poor-quality soil, and biotic stresses.

Assembly

129 Gb of raw data was generated from two Illumina HiSeq2500 libraries representing ~320× sequencing coverage of the genome. The genome assembly was generated using a modified version of Arachne (v.20071016). Contig assembly and initial scaffolding steps produced 37,263 contigs in 32,740 scaffolds (N50 length: 132.6 Kb), totaling 563.0 Mb. Gap patching on the scaffolds was performed using ~25× PacBio reads (N50 length: 17,094 bp) and QUIVER. Final scaffolding was completed using SSPACE-Standard (Version 2.0) with Nextera long mate pair libraries (insert size 4 Kb and 8 Kb), resulting in a 537.9 Mb assembly (16,786 scaffolds; 20,979 contigs) with a scaffold N50 of 312 Kb. To anchor the scaffolds into chromosomes, the sequences were ordered and oriented into 11 pseudomolecules using Corymbia genetic maps. Three high-density linkage maps were generated from two C. torelliana × C. citriodora subsp. variegata hybrid crosses genotyped with Diversity Arrays DArTseq technology, and contigs were anchored to the marker sequences using ALLMAPS. The average Spearman correlation coefficient of centimorgan (cM) positions for genetic map markers from all three linkage maps and physical locations on scaffolds was 0.96. The pseudomolecules range in size from 24.8 Mb (Chromosome 9) to 55.7 Mb (Chromosome 8). The total genome size of chromosome anchored scaffolds (n = 4,033) was 412 Mb (408 Mb in contigs), which is close to the estimated genome size of 370–390 Mb, based on flow cytometry.

Annotation

To annotate the genome, RNA was collected from five separate tissues (expanded leaves [EL], unexpanded leaves [UL], flower buds [FB], flower initials [FI], photosynthetic bark cortex [BA]) and was used for de novo gene model prediction. The final annotation of protein-coding gene products comprised 35,632 primary transcripts and 10,019 alternative transcripts for a total of 45,651 transcript models. The set of primary transcripts had a mean length of 3.4 Kb, a mean of 4.8 exons, with a median exon length of 176 bp and a median intron length of 202 bp.

Picture credit: Wikipedia

Statistics

Summary

AssemblyCcitriodora_v2_1, INSDC Assembly GCA_014858505.1, Oct 2020
Database version111.1
Golden Path Length544,191,601
Genebuild byARRAY(0xb18f118)
Genebuild methodExternal annotation import
Data sourceDOE Joint Genome Institute

Gene counts

Coding genes35,628
Gene transcripts45,647