Cucumis sativus Assembly and Gene Annotation

About Cucumis sativus

Cucumber (Cucumis sativus) is a widely cultivated plant in the gourd family, Cucurbitaceae. It is a creeping vine that bears cucumiform fruits that are used as vegetables. There are three main varieties of cucumber: slicing, pickling, and seedless. Within these varieties, several cultivars have been created. In North America, the term "wild cucumber" refers to plants in the genera Echinocystis and Marah, but these are not closely related. The cucumber is originally from South Asia, but now grows on most continents. Many different types of cucumber are traded on the global market. [1]


The 'Chinese long' inbred line 9930 was selected for the genome sequencing project. A total of 26.5 billion high-quality base pairs were generated, or 72.2-fold genome coverage, of which the Sanger reads provided 3.9-fold coverage and the Illumina GA reads provided 68.3-fold coverage The GA reads ranged in length from 42 to 53 bp. [2]

The final assembly is of length 195,669,205 bp consisting of 190 scaffolds (N50 = 29,076,228) and 11,366 contigs (N50 = 42,349) [3].


Protein coding gene prediction was done using three methods (cDNA-EST, homology based and ab initio) and a consensus gene set was built by merging all of the results. A total of 26,682 genes were predicted, with a mean coding sequence size of 1,046 bp and an average of 4.39 exons per gene. Under an 80% sequence overlap threshold, 26.7% of the genes were supported by all three gene prediction methods, 25% had both ab initio prediction and homology-based evidence, and 7.4% had ab initio prediction and cDNA-EST expression evidence; the remaining genes were primarily derived from pure ab initio prediction, but the majority of these were supported by multiple gene finders. About 81% of the genes have homologs in the TrEMBL protein database, and 66% can be classified by InterPro. In total, 82% of the genes have either known homologs or can be functionally classified [2].


  1. Cucumber.
  2. The genome of the cucumber, Cucumis sativus L.
    Sanwen Huang, Ruiqiang Li[]Songgang Li. 2009. Nature Genetics. 41:12751281.
  3. NCBI page for Cucumis sativus (cucumber) assembly.

Picture credit: By Francisco Manuel Blanco (O.S.A.) [Public domain], via Wikimedia Commons

More information

General information about this species can be found in Wikipedia.



AssemblyASM407v2, INSDC Assembly GCA_000004075.2, Oct 2014
Database version94.2
Base Pairs193,829,320
Golden Path Length193,829,320
Genebuild byENA
Genebuild methodImported from ENA
Data sourceThe Cucumber Genome Initiative

Gene counts

Coding genes23,780
Non coding genes682
Small non coding genes681
Long non coding genes1
Gene transcripts24,462

About this species