Sorghum bicolor Assembly and Gene Annotation

About Sorghum bicolor

Sorghum bicolor is a widely grown cereal crop, particularly in Africa, ranking 5th in global cereal production. The diploid genome (~730 Mbp) has a haploid chromosome number of 10. Although highly repetitive, the genome is more tractable for sequencing than its close relative, Zea mays.

Assembly

The genome assembly of Sorghum bicolor cv. Moench was published in 2009 [1]. Sequencing by the US department of Energy Joint Genome Institute (JGI) Community Sequencing Program in collaboration with the Plant Genome Mapping Laboratory followed a whole genome shotgun strategy reaching 8x coverage with scaffolds, where possible, being assigned to the genetic map.

Annotation

Gene predictions resulted from combining homology-based and ab initio methods with expressed sequences from sorghum, maize and sugar cane, using the JGI annotation pipeline. This browser presents data from the Sbi1 assembly and Sbi1.4 gene set (March 2007). Read more at Phytozome.

Repeats

Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 6 repeats loaded from (ENA); 685,783 Low complexity (Dust) features, covering 29 Mb (4.0% of the genome); 455,749 RepeatMasker features (with the RepBase library), covering 451 Mb (62.1% of the genome); 392,778 RepeatMasker features (with the REdat library), covering 409 Mb (56.2% of the genome); and 245,654 Tandem repeats (TRF) features, covering 41 Mb (5.7% of the genome).

Variation

Morris et al. SNPs set [2]

The Morris et al. set is from a 2013 study of agroclimatic traits in Sorghum [2]. In this study, approximately 265,000 single nucleotide polymorphisms (SNPs) were characterized from 971 worldwide accessions, combining three previously defined sorghum diversity panels. They are: the US sorghum association panel (SAP), the sorghum mini core collection (MCC) and the Generation Challenge Program sorghum reference set (RS). GWAS studies were subsequently performed on plant height components and inflorescence architecture using 336 SAP lines. The data presented here represents the genotype information of the 378 SAP lines provided by the author.

Mace et al. SNPs set [3]

This Sorghum variation data set corresponds to 6,578,420 SNPs (SNPs mapping to supercontigs were removed) genotyped in 45 Sorghum bicolor lines including the BTx623 reference genome plus 2 S. propinquum lines reported by Mace et al (2013). The data was obtained by resequencing the genomes of the 44 Sorghum bicolor lines representing the primary gene pool and spanning dimensions of geographic origin, end-use and taxonomic group (i.e., major races of cultivated S. bicolor, landraces, improved inbreds, progenitors, wild and weedy), and the first resequenced genome of S. propinquum, all of which were mapped to the BTx623 S. bicolor reference genome.

Jiao et al. EMS SNPs set [5]

The Jiao EMS dataset includes ~1.8 millions ethyl methane sulfonate (EMS)-induced G/C to A/T transition mutations annotated from 252 M3 families selected from the 6,400 sorghum mutant library in BTx623 background [1]. Genomic DNA used for sequencing was pooled from 20 M3 plants per M2 family.

Structural Variation

Data for structural variation in sorghum has been imported from the Database of Genomic Variants archive (dGVA) from a single study containing around 32 thousand structural variations [4]. Click here for example.

Links

References

  1. The Sorghum bicolor genome and the diversification of grasses.
    Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A et al. 2009. Nature. 457:551-556.
  2. Population genomic and genome-wide association studies of agroclimatic traits in sorghum.
    Morris GP, Ramu P, Deshpande SP, Hash CT, Shah T, Upadhyaya HD, Riera-Lizarazu O, Brown PJ, Acharya CB, Mitchell SE et al. 2013. Proc. Natl. Acad. Sci. U.S.A.. 110:453-458.
  3. Whole-genome sequencing reveals untapped genetic potential in Africa's indigenous cereal crop sorghum.
    Mace ES, Tai S, Gilding EK, Li Y, Prentis PJ, Bian L, Campbell BC, Hu W, Innes DJ, Han X et al. 2013. Nat Commun. 4:2320.
  4. Genome-wide patterns of genetic variation in sweet and grain sorghum (Sorghum bicolor).
    Zheng LY, Guo XS, He B, Sun LJ, Peng Y, Dong SS, Liu TF, Jiang S, Ramachandran S, Liu CM et al. 2011. Genome Biol.. 12:R114.
  5. A Sorghum Mutant Resource as an Efficient Platform for Gene Discovery in Grasses.
    Jiao Y, Burke J, Chopra R, Burow G, Chen J, Wang B, Hayes C, Emendack Y, Ware D, Xin Z. 2016. Plant Cell. 28:1551-1562.

Statistics

Summary

AssemblySorghum_bicolor_v2, INSDC Assembly GCA_000003195.2, Aug 2016
Database version90.20
Base Pairs689,728,766
Golden Path Length726,795,624
Genebuild byJGI
Genebuild methodImported from JGI-Phytozome
Data sourceJGI

Gene counts

Coding genes33,235
Non coding genes77
Small non coding genes77
Gene transcripts39,736

Other

Short Variants8,187,272
Structural variants29,164

About this species