Triticum aestivum Assembly and Gene Annotation
About Triticum aestivum
Triticum aestivum (bread wheat) is a major global cereal grain essential to human nutrition. Wheat was one of the first cereals to be domesticated, originating in the fertile crescent around 7000 years ago. Bread wheat is hexaploid, with a genome size estimated at ~17 Gbp, composed of three closely-related and independently maintained genomes that are the result of a series of naturally occuring hybridization events. The ancestral progenitor genomes are considered to be Triticum urartu (the A-genome donor) and an unknown grass thought to be related to Aegilops speltoides (the B-genome donor). This first hybridization event produced tetraploid emmer wheat (AABB, T. dicoccoides) which hybridized again with Aegilops tauschii (the D-genome donor) to produce modern bread wheat.
IWGSC Chromosome survey sequence
The bread wheat genome in Ensembl Plants is version 1.0 of the chromosome survey sequence for Triticum aestivum cv. Chinese Spring generated by the International Wheat Genome Sequencing Consortium. The gene models are provided by MIPS (version 2.0).
The sequence has been generated using the Illumina platform from flow-sorted chromosome arms. The resulting assemblies are fragmented and resolution of repetitive regions is still limited. Nonetheless, assembly of gene-containing regions is reasonably good (N50 of 2.5kb) and the predicted gene models are close in terms of length and exon count to those previously predicted for other closely related species.
Due to the large number of scaffolds in the assembly, only a subset is visible in the browser, comprising all scaffolds equal or greater than 3kb and all scaffolds to which a gene was predicted or a wheat cDNA alignment has been made (~730,000 scaffolds). This set has been included in the Ensembl Plants BLAST and ENA search services.
The complete set of survey sequences may be downloaded from The Genome Analysis Centre, and may be searched using the TGAC blast server. The data are also available in the archives of the International Nucleotide Sequence Database Consortium, under the PRJEB3955 project.
In addition to sequence assemblies and gene models, a number of additional data sets have been aligned to the survey sequence, including the complete genomes of Brachypodium distachyon and rice (Oryza sativa), wheat UniGene clusters from NCBI, and wheat RNA-seq data deposited in the INSDC archives.
Protein-coding gene set from MIPS
The structure of the gene models were computed by spliced-alignments (GenomeThreader) of publically available wheat fl-cDNAs and protein sequences of related grass species barley, Brachypodium, rice and Sorghum, respectively. Thereby, redundant transcript structures (sharing intron boundaries) from different references were merged. Additionally, a comprehensive RNA-seq dataset including five different tissues (root, leaf, spike, stem, grain) and different developmental stages was also used to identify wheat specific genes and additional splicing variants. Wheat RNA-seq short-reads were aligned stringently against the sequence survey assembly (using Bowtie and TopHat) and the transcript structure assembled using Cufflinks.
A total of 108,569 genes and transcripts were predicted. For every gene loci, 179,645 additional splice variants were predicted. To simplify the display, these alternative splice variants were loaded separately and can be visualized in a different track on the contig view. Click here for example. This track is not set by default, to turn it on, Use "Configure This page" menu.
Triticeae-CAP predicted transcripts set - Krasileva et al. 
Predicted transcripts annotations have also been inferred from Exonerate alignments of wheat coding sequences (CDS) from two sets of transcripts: Triticum turgidum assembled RNAseq data (Krasileva et al., Genome Biology 2013, 14:R66, Supplemental dataset 7) and a collection of publicly available wheat transcripts filtered to exclude pseudogenes, sequences shorter than 90 bp, and ORFs similar to those present in the T. turgidum set. Click here for example.
The program findorf was used to predict the CDS within these transcripts as described in Krasileva et al. . See Triticeae-CAP project page for more information.
Triticeae Repeats from TREP were aligned to the T. aestivum genome using RepeatMasker.
Additional standard annotations are described here.
Wheat RNA-Seq, and UniGene datasets have been aligned to the Triticum aestivum genome:
- 454 RNA-seq data were aligned using STAR, for the following ENA studies:
- UniGene cluster sequence data were aligned using Exonerate, following the standard Ensembl pipeline. Click here for example.
Analysis of the bread wheat genome using comparative whole genome shotgun sequencing - Brenchley et al. 
The wheat genome assemblies previously generated by Brenchley et al. (PMID:23192148) have also been aligned to the survey sequence, Brachypodium, barley and the wild wheat progenitors (Triticum urartu and Aegilops tauschii). Homoeologous variants inferred between the three wheat genomes (A, B, and D) are displayed in the context of the gene models of these five genomes.
Sequences of diploid progenitor and ancestral species permitted homoeologous variants to be classified into two groups, 1) SNPs that differ between the A and D genomes (where the B genome is unknown) and, 2) SNPs that are the same between the A and D genomes, but differ in B.
The wheat gene alignments and the projected wheat SNPs are available on the Location view of the Triticum aestivum, Brachypodium distachyon and Hordeum vulgare genomes, as additional tracks under the "Wheat SNPs and alignments" section of the "Configure This page" menu. Click here for a bread wheat example. Click here for a Brachypodium example. Click here for a barley example.
Wheat sequence search
In addition to the normal sequence search facilities, provided against any reference genome, a dedicated wheat sequence search service allows you to find alignments between your favourite genes and all the publicly available bread wheat genome sequences. The matched wheat sequences can be visualized in the context of reference models from the Hordeum vulgare and Brachypodium distachyon genomes.
Search is performed via the ENA search service, and currently includes:
- The 5x 454 whole genome assembly,
- ~1.3 million wheat EST sequences, and
- ~57,000 wheat Unigene cluster sequences.
- International Wheat Genome Sequencing Consortium (IWGSC)
- URGI Wheat Portal
- MIPS International Wheat Survey Genome Database
- MIPS 5x 454 Survey Wheat Genome Database
- Triticeae Genomics For Sustainable Agriculture resource page
- TREP, the Triticeae Repeat Sequence Database
- ENA study ERP000319: 454 pyrosequencing of the Triticum aestivum (bread wheat) genome to 5X coverage
- ENA study ERP001415: 454 sequencing of Triticum aestivum (bread wheat) cv. Chinese spring cDNA samples from a pool of tissues, from plants under drought stress and from circadian-sampled leaves
- Triticum aestivum ESTs at ENA
- Triticum aestivum Unigene cluster sequences at NCBI
- Analysis of the bread wheat genome using whole-genome shotgun sequencing.
Brenchley R, Spannagl M, Pfeifer M, Barker GL, D'Amore R, Allen AM, McKenzie N, Kramer M, Kerhornou A, Bolser D et al. 2012. Nature. 491:705-710.
- Separating homeologs by phasing in the tetraploid wheat transcriptome.
Krasileva KV, Buffalo V, Bailey P, Pearce S, Ayling S, Tabbita F, Soria M, Wang S, Consortium I, Akhunov E et al. 2013. Genome Biol.. 14:R66.
|Assembly:||IWGSP1, Jul 2013|
|Golden Path Length:||4,460,384,120|
|Genebuild method:||Imported from MIPS|