Ensembl resources are currently in reduced functionality mode. Please see our blog post for the latest information and our Twitter feed

Saccharum spontaneum Assembly and Gene Annotation

About Saccharum spontaneum

Saccharum spontaneum is a sugar-poor relative of sugarcane which breeders have backcrossed for traits such as hardiness, disease resistance or ratooning capacity. It belongs to the PACMAD clade within grasses (Poaceae). This is a Hi-C-based genome assembly of one haploid individual (AP85-441, 1n =​ 4x =​ 32). The genome size was estimated to be 3.36 Gbp by flow cytometry. This genome is an international collaboration led by Fujian Agriculture and Forestry University, University of Illinois at Urbana-Champaign and Hawaii Agriculture Research Center.


A contig-level assembly was first obtained by combining sequencing data from BAC pools, a PacBio library (20kbp) and Illumina pair-end libraries (280 & 500bp) for polishing. While BAC pools were assembled with ALLPATHS-LG, SPAdes and SOAPdenovo2, for PacBio assembly Canu v1.5 was used. This yielded a genome of 3.13 Gbp with contig N50 of 45 kb. Subsequently a chromosomal assembly named Sspon.HiC_chr_asm was constructed based on proximity-guided assembly using ALLHIC, which is designed for polyploid genome scaffolding. A Hi-C-based physical map was used to assemble 32 pseudo-chromosomes that anchor 2.9 Gbp of the genome, including 97% of the gene content. A high-density genetic map was used to verify that 89% of contigs have congruent positions. The resulting 32 pseudo-chromosomes comprise 8 homologous groups with 4 sets of monoploid chromosomes: A, B, C and D.


Two rounds of MAKER annotation, followed by manual curation to separate genes and alleles, yielded over 35,500 genes with allele-specific resolution. These included 4,289 (12.7%) genes with 4 alleles, 9,792 (27.6%) with 3, 14,797 (41.7%) with 3, and finally 6,647 (18.7%) with one allele resolved. BUSCO v3 was used for evaluation of annotation completeness. Out of 1,440 conserved genes, 1,397 (97.1%) were re-annotated in the AP85-441 genome, among which 1,101 (76.5%) were complete genes.

Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 3,066,463 Low complexity (Dust) features, covering 85 Mb (3.0% of the genome); 1,621,792 RepeatMasker features (with the REdat), covering 1,257 Mb (43.3% of the genome); 35,078 RepeatMasker features (with the RepBase library), covering 4 Mb (0.1% of the genome); 1,145,071 Tandem repeats (TRF) features, covering 145 Mb (5.0% of the genome).


  1. Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L.
    Zhang J, Zhang X, Tang H et al. 2018. Nature Genetics. 50(11):1565-1573.

Picture credit: Biswarup Ganguly, licensed under the Creative Commons Attribution 3.0 Unported license

More information

General information about this species can be found in Wikipedia.



AssemblySspon.HiC_chr_asm, INSDC Assembly GCA_003544955.1,
Database version99.1
Base Pairs2,900,240,836
Golden Path Length2,900,240,836
Genebuild byUIL
Genebuild methodImport
Data sourceEuropean Nucleotide Archive

Gene counts

Coding genes83,815
Gene transcripts83,815

About this species