Coffea canephora (AUK_PRJEB4211_v1)

Coffea canephora Assembly and Gene Annotation

About Coffea canephora

Coffea canephora, commonly known as robusta coffee, is a species of coffee in the Rubiaceae family. Within its genus, C. canephora has the widest natural distribution which extends west to east from Guinea to Uganda, and north to south from Cameroon to Angola. It is an allogamous diploid flowering plant (2n=2x=22). This reference genome results from a collaboration between Genoscope, IRD and Cirad (UMRs AGAP, DIADE and RPB), funded by ANR, and the Coffee Genome Sequencing Consortium.


The sequenced genotype (2n=22, 1C=710 Mb) is a doubled-haploid plant (accession DH200-94) produced by IRD from the clone IF200. A total of 54.4 million Roche 454 single and mate-pair reads and 143,605 Sanger bacterial artificial chromosome--end reads were generated, achieving 30x coverage. Additional Illumina sequencing data (60x) were used to improve the assembly. The resulting assembly consists of 25,216 contigs and 13,345 scaffolds with a total length of 568.6 Mb. Eighty percent of the assembly is in 635 scaffolds, and the scaffold N50 is 1.26 Mb. A high-density genetic map comprising 64% of the assembly and 86% of the annotated genes was anchored to 11 chromosomes.


A total of 25,574 protein-coding genes were annotated using various sources of evidence (cDNAs, RNA-Seq, protein alignments, and ab initio predictions) that were combined into gene models.

Repeated sequences were called with the Repeat Detector, which is part of the Ensembl Genomes repeat feature pipelines. Repeats length: 209982696 - Repeats content: 36.9%


  1. The coffee genome provides insight into the convergent evolution of caffeine biosynthesis.
    Denoeud F, Carretero-Paulet L, Dereeper A et al. . 2014. Science. 345(6201):1181-1184.

Picture credit: Credit: Jee & Rani Nature Photography (License: CC BY-SA 4.0) 2010

Links {#links dir="ltr"}

More information

General information about this species can be found in Wikipedia.



AssemblyAUK_PRJEB4211_v1, INSDC Assembly GCA_900059795.1,
Database version109.1
Golden Path Length568,611,505
Genebuild byGenoscope CEA
Genebuild methodImport
Data sourceGenoscope CEA

Gene counts

Coding genes25,574
Gene transcripts25,574