Theobroma cacao Belizian Criollo B97-61/B2 Assembly and Gene Annotation
About Theobroma cacao
Theobroma cacao (cacao or chocolate tree) is a neotropical plant native to Amazonian rainforests. It is now cultivated in over 50 countries. A member of Malvaceae family, its beans are harvested from pods for use as the food chocolate, in confections and cosmetics. Cacao is a diploid species (2n=2x=20) with a relatively small genome (from 411 Mb to 494 Mb). This is the V2 genome assembly and annotation of the Belizian Criollo B97-61/B2 cultivar.
Assembly
Four Illumina large insert size mate paired libraries wew combined with 52x of Pacific Biosciences long reads to correct misassembled regions and reduced the number of scaffolds. In addition, genotyping by sequencing SNPs from a UF676 x ICS95 mapping population of 434 individuals were used to increase the proportion of the assembly anchored to chromosomes. The scaffold number decreased from 4,792 in assembly V1 to 554 in V2 while the scaffold N50 size increased from 0.47 Mb in V1 to 6.5 Mb. A total of 96.7% of the assembly was anchored to the 10 chromosomes. Unknown sites (Ns) were reduced from 10.8% to 5.7%. This assembly was produced by the Cocoa Genome Hub.
Annotation
V1 gene annotations produced by EUGene, following specific training for T. cacao, were combined with a new, de novo RefSeq structural annotation performed by the NCBI Eukaryotic Genome Annotation Pipeline based on RNAseq evidence. 98.6% of the V1 gene models were relocated to the V2 assembly. In total 345 genes from V1 were relocated to a different chromosome in V2. A consensus annotation to select the best structural predictions between both datasets was performed, yielding 29,071 consensus protein coding gene models.
References
- The cacao Criollo genome v2.0: an improved version of the genome
for genetic and functional genomic
studies.
X Argout, G Martin, G Droc, O Fouet, K Labadie, E Rivals, JM Aury, C Lanaud. 2017. BMC Genomics. 18:730.
Picture credit: Cocoa beans in a cacao pod. Photo by Keith Weller. This image was released by the Agricultural Research Service, the research agency of the United States Department of Agriculture, with the ID K4636-14.
Links
More information
General information about this species can be found in Wikipedia.
Statistics
Summary
Assembly | Criollo_cocoa_genome_V2, INSDC Assembly GCA_000208745.2, |
Database version | 113.1 |
Golden Path Length | 324,719,311 |
Genebuild by | CGH |
Genebuild method | External annotation import |
Data source | CIRAD |
Gene counts
Coding genes | 21,257 |
Non coding genes | 2,164 |
Small non coding genes | 2,164 |
Pseudogenes | 1,140 |
Gene transcripts | 35,849 |