Saccharum spontaneum Assembly and Gene Annotation
About Saccharum spontaneum
Saccharum spontaneum is a sugar-poor relative of sugarcane which breeders have backcrossed for traits such as hardiness, disease resistance or ratooning capacity. It belongs to the PACMAD clade within grasses (Poaceae). This is a Hi-C-based genome assembly of one haploid individual (AP85-441, 1n = 4x = 32). The genome size was estimated to be 3.36 Gbp by flow cytometry. This genome is an international collaboration led by Fujian Agriculture and Forestry University, University of Illinois at Urbana-Champaign and Hawaii Agriculture Research Center.
A contig-level assembly was first obtained by combining sequencing data from BAC pools, a PacBio library (20kbp) and Illumina pair-end libraries (280 & 500bp) for polishing. While BAC pools were assembled with ALLPATHS-LG, SPAdes and SOAPdenovo2, for PacBio assembly Canu v1.5 was used. This yielded a genome of 3.13 Gbp with contig N50 of 45 kb. Subsequently a chromosomal assembly named Sspon.HiC_chr_asm was constructed based on proximity-guided assembly using ALLHIC, which is designed for polyploid genome scaffolding. A Hi-C-based physical map was used to assemble 32 pseudo-chromosomes that anchor 2.9 Gbp of the genome, including 97% of the gene content. A high-density genetic map was used to verify that 89% of contigs have congruent positions. The resulting 32 pseudo-chromosomes comprise 8 homologous groups with 4 sets of monoploid chromosomes: A, B, C and D.
Two rounds of MAKER annotation, followed by manual curation to separate genes and alleles, yielded over 35,500 genes with allele-specific resolution. These included 4,289 (12.7%) genes with 4 alleles, 9,792 (27.6%) with 3, 14,797 (41.7%) with 3, and finally 6,647 (18.7%) with one allele resolved. BUSCO v3 was used for evaluation of annotation completeness. Out of 1,440 conserved genes, 1,397 (97.1%) were re-annotated in the AP85-441 genome, among which 1,101 (76.5%) were complete genes.
Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 3,066,463 Low complexity (Dust) features, covering 85 Mb (3.0% of the genome); 1,621,792 RepeatMasker features (with the REdat), covering 1,257 Mb (43.3% of the genome); 35,078 RepeatMasker features (with the RepBase library), covering 4 Mb (0.1% of the genome); 1,145,071 Tandem repeats (TRF) features, covering 145 Mb (5.0% of the genome).
- Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L.
Zhang J, Zhang X, Tang H et al. 2018. Nature Genetics. 50(11):1565-1573.
Picture credit: Biswarup Ganguly, licensed under the Creative Commons Attribution 3.0 Unported license
General information about this species can be found in Wikipedia.