Sorghum bicolor Assembly and Gene Annotation

About Sorghum bicolor

Sorghum bicolor is a widely grown cereal crop, particularly in Africa, ranking 5th in global cereal production. It is also used as biofuel crop and potential cellulosic feedstock. The diploid genome (~730 Mbp) has a haploid chromosome number of 10. Although highly repetitive, the genome is more tractable for sequencing than its close relative, Zea mays.


The first genome assembly of Sorghum bicolor cv. Moench was published in 2009 [1]. Sequencing by the US department of Energy Joint Genome Institute (JGI) Community Sequencing Program in collaboration with the Plant Genome Mapping Laboratory followed a whole genome shotgun strategy reaching 8x coverage with scaffolds, where possible, being assigned to the genetic map. Since then JGI made two rounds of improvements. The most recent update of release v3.0 includes ~351 Mb of finished sorghum sequence. A total of 349 clones were manually inspected, then finished and validated using a variety of technologies. They were integrated into chromosomes by aligning to v1.0 assembly. As a result, 4,426 gaps were closed, and a total of 4.96 MB of sequence was added to the assembly. Overall contiguity (contig N50) increased by a factor of 5.8x from 204.5 KB to 1.2 MB. For more details, see phytozome.


This browser presents data from the v3.0.1 assembly and v3.1.1 gene set (March 2007). Gene prediction is an improved process based upon resources used in original v1.0 release (Sbi1 assembly and Sbi1.4 gene set) with new geneAtlas RNA-seq data. Read more at Phytozome.

Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 6 repeats loaded from (ENA); 685,783 Low complexity (Dust) features, covering 29 Mb (4.0% of the genome); 455,749 RepeatMasker features (with the RepBase library), covering 451 Mb (62.1% of the genome); 392,778 RepeatMasker features (with the REdat library), covering 409 Mb (56.2% of the genome); and 245,654 Tandem repeats (TRF) features, covering 41 Mb (5.7% of the genome).


Morris et al. SNPs set [2]

The Morris et al. set is from a 2013 study of agroclimatic traits in Sorghum [2]. In this study, approximately 265,000 single nucleotide polymorphisms (SNPs) were characterized from 971 worldwide accessions, combining three previously defined sorghum diversity panels. They are: the US sorghum association panel (SAP), the sorghum mini core collection (MCC) and the Generation Challenge Program sorghum reference set (RS). GWAS studies were subsequently performed on plant height components and inflorescence architecture using 336 SAP lines. The data presented here represents the genotype information of the 378 SAP lines provided by the author.

Mace et al. SNPs set [3]

This Sorghum variation data set corresponds to 6,578,420 SNPs (SNPs mapping to supercontigs were removed) genotyped in 45 Sorghum bicolor lines including the BTx623 reference genome plus 2 S. propinquum lines reported by Mace et al (2013). The data was obtained by resequencing the genomes of the 44 Sorghum bicolor lines representing the primary gene pool and spanning dimensions of geographic origin, end-use and taxonomic group (i.e., major races of cultivated S. bicolor, landraces, improved inbreds, progenitors, wild and weedy), and the first resequenced genome of S. propinquum, all of which were mapped to the BTx623 S. bicolor reference genome.

Jiao et al. EMS SNPs set [5]

The Jiao EMS dataset includes ~1.8 millions ethyl methane sulfonate (EMS)-induced G/C to A/T transition mutations annotated from 252 M3 families selected from the 6,400 sorghum mutant library in BTx623 background [1]. Genomic DNA used for sequencing was pooled from 20 M3 plants per M2 family.

Structural Variation

Data for structural variation in sorghum has been imported from the Database of Genomic Variants archive (dGVA) from a single study containing around 32 thousand structural variations [4]. Click here for example.



  1. The Sorghum bicolor genome and the diversification of grasses.
    Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A et al. 2009. Nature. 457:551-556.
  2. Population genomic and genome-wide association studies of agroclimatic traits in sorghum.
    Morris GP, Ramu P, Deshpande SP, Hash CT, Shah T, Upadhyaya HD, Riera-Lizarazu O, Brown PJ, Acharya CB, Mitchell SE et al. 2013. Proc. Natl. Acad. Sci. U.S.A.. 110:453-458.
  3. Whole-genome sequencing reveals untapped genetic potential in Africa's indigenous cereal crop sorghum.
    Mace ES, Tai S, Gilding EK, Li Y, Prentis PJ, Bian L, Campbell BC, Hu W, Innes DJ, Han X et al. 2013. Nat Commun. 4:2320.
  4. Genome-wide patterns of genetic variation in sweet and grain sorghum (Sorghum bicolor).
    Zheng LY, Guo XS, He B, Sun LJ, Peng Y, Dong SS, Liu TF, Jiang S, Ramachandran S, Liu CM et al. 2011. Genome Biol.. 12:R114.
  5. A Sorghum Mutant Resource as an Efficient Platform for Gene Discovery in Grasses.
    Jiao Y, Burke J, Chopra R, Burow G, Chen J, Wang B, Hayes C, Emendack Y, Ware D, Xin Z. 2016. Plant Cell. 28:1551-1562.

Picture credit: By Sahaquiel9102 (Own work) [CC BY 3.0 (], via Wikimedia Commons

More information

General information about this species can be found in Wikipedia.



AssemblySorghum_bicolor_NCBIv3, INSDC Assembly GCA_000003195.3, Jun 2017
Database version91.30
Base Pairs675,363,888
Golden Path Length708,735,318
Genebuild byEnsemblPlants
Genebuild methodGenerated from ENA annotation
Data sourceEuropean Nucleotide Archive

Gene counts

Coding genes34,118
Gene transcripts47,110

About this species