Triticum aestivum Assembly and Gene Annotation

For information about the assembly and annotation please view the IWGSC announcement.

The previous wheat assembly (TGACv1) and every other plant from release 31 is available in the new Ensembl Plants archive site.

About Triticum aestivum

Triticum aestivum (bread wheat) is a major global cereal grain essential to human nutrition. Wheat was one of the first cereals to be domesticated, originating in the fertile crescent around 7000 years ago. Bread wheat is hexaploid, with a genome size estimated at ~17 Gbp, composed of three closely-related and independently maintained genomes that are the result of a series of naturally occurring hybridization events. The ancestral progenitor genomes are considered to be Triticum urartu (the A-genome donor) and an unknown grass thought to be related to Aegilops speltoides (the B-genome donor). This first hybridization event produced tetraploid emmer wheat (AABB, T. dicoccoides) which hybridized again with Aegilops tauschii (the D-genome donor) to produce modern bread wheat.

Assembly

The IWGSC RefSeq v1.0 is an integration of the IWGSC WGA v0.4 (comprised of Illumina short sequence reads assembled with NRGene’s DeNovoMAGICTM software) with IWGSC chromosome-based and other resources (physical maps, MTP BAC WGPTM sequence tags; and for some chromosomes: sequenced BACs, BioNano optical maps, Alignment to RH maps & GBS map of the SynOp RIL population CsxRn genetic map) Scaffolds/superscaffold have been assigned to chromosomal locations using POPSEQ data and a HiC map. Chromosomal scaffold/ superscaffold N50 is 22.8 Mb.

Annotation

The IWGSC RefSeq v1.0 annotation includes gene models generated by integrating predictions made by INRA-GDEC using Triannot and PGSB using their customised pipeline (previously MIPS pipeline). The integration was undertaken by the Earlham institute (EI), who have also added UTRs to the gene models where supporting data are available. Gene models have been assigned to high confidence (HC) or low confidence (LC) classes based on completeness, similarity to genes represented in protein and DNA databases and repeat content. The automated assignment of functional annotation to genes has been generated by PGSB based on AHRD parameters.
The annotation includes 110,790 high confidence genes, 158,793 low confidence genes and 13,044 long coding RNAs.
98,270 high confidence genes from the TGACv1 annotation [3] were aligned to the assembly using Exonerate. For each gene up to 3 alignments are displayed, compromising 196,243 alignments of which 90,686 are protein coding.

Variation

Data from CerealsDB [1]

768664 markers from the 820K Axiom SNP array from CerealsDB were aligned to the assembly.
This was done by CerealsDB[1] using Blast with a cutoff of 1e-05. The top three hits were parsed and compared to CerealsDB genetic map data. In cases where two or more of the top three hits had an identical score, the hit agreed with the genetic map was selected. In cases of no genetic map information for a particular SNP then the top hit was selected.

EMS Mutation data [2]

EMS-type variants from sequenced tetraploid (cv ‘Kronos’) and hexaploid (cv ‘Cadenza’) TILLING populations. Mutations were called on the IWGSC RefSeq V1.0 assembly using the Dragen system[4]

  • 4.4 million Kronos mutations
  • 9.0 million Cadenza mutations

Researchers and breeders can search this database online, identify mutations in the different copies of their target gene, and request seeds to study gene function or improve wheat varieties. Seeds can be requested from the UK SeedStor (https://www.seedstor.ac.uk/shopping-cart-tilling.php) or from the US based Dubcovsky lab (http://dubcovskylab.ucdavis.edu/wheat-tilling).

This resource was generated as part of a joint project between the University of California Davis, Rothamsted Research, The Earlham Institute, and the John Innes Centre.

References

  1. CerealsDB 3.0: expansion of resources and data integration.
    Wilkinson PA, Winfield MO, Barker GL,Tyrrell S, Bian X, Allen AM, Burridge A, Coghill JA, Waterfall C, Caccamo M et al. 2016. BMC Bioinformatics. 17:256.
  2. Uncovering hidden variation in polyploid wheat.
    Ksenia V. Krasileva, Hans A. Vasquez-Gross, Tyson Howell, Paul Bailey, Francine Paraiso, Leah Clissold, James Simmonds, Ricardo H. Ramirez-Gonzalez et al. . 2016. PNAS. 114:E913E921.
  3. An improved assembly and annotation of the allohexaploid wheat genome identifies complete families of agronomic genes and provides genomic evidence for chromosomal translocations.
    Bernardo J. Clavijo, Luca Venturini,Christian Schudoma, Gonzalo Garcia Accinelli, Gemy Kaithakottil, Jonathan Wright, Philippa Borrill, George Kettleborough, Darren Heavens, Helen Chapman et al. 2017. Genome Research.
  4. Ultra-Fast Next Generation Human Genome Sequencing Data Processing Using DRAGENTM Bio-IT Processor for Precision Medicine.
    Goyal, A., Kwon, H.J., Lee, K., Garg, R., Yun, S.Y. et al. 2017. Open Journal of Genetics. 7:9-19.

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyIWGSC, INSDC Assembly , Jul 2018
Database version93.4
Base Pairs14,547,261,565
Golden Path Length14,547,261,565
Genebuild byIWGSC
Genebuild methodImported from IWGSC
Data sourceIWGSC

Gene counts

Coding genes110,790
Gene transcripts137,056

Other

Short Variants14,142,687

About this species