Theobroma cacao cultivars
Selected cacao cultivars relevant for genomics and breeding will be displayed in Ensembl.
The genome assembly V2 of Belizian Criollo B97-61/B2 cultivar was obtained from four Illumina large insert size mate paired libraries combined with 52x of Pacific Biosciences long reads to correct misassembled regions and reduce the number of scaffolds. In addition, genotyping by sequencing SNPs from a UF676 x ICS95 mapping population of 434 individuals were used to increase the proportion of the assembly anchored to chromosomes. The scaffold number decreased from 4,792 in assembly V1 to 554 in V2 while the scaffold N50 size increased from 0.47 Mb in V1 to 6.5 Mb. A total of 96.7% of the assembly was anchored to the 10 chromosomes. Unknown sites (Ns) were reduced from 10.8% to 5.7%.
This assembly was produced by the Cocoa Genome Hub. Gene annotation was produced by EUGene, following specific training for T. cacao, combined with a new, de novo RefSeq structural annotation performed by the NCBI Eukaryotic Genome Annotation Pipeline based on RNAseq evidence. 98.6% of the V1 gene models were relocated to the V2 assembly. In total 345 genes from V1 were relocated to a different chromosome in V2. A consensus annotation to select the best structural predictions between both datasets was performed, yielding 29,071 consensus protein-coding gene models. See details at https://doi.org/10.1186/s12864-017-4120-9.