Aegilops tauschii Assembly and Gene Annotation

Wheat genomics resources are developed as part of our involvement in the consortium Triticeae Genomics For Sustainable Agriculture, funded by the BBSRC, and led by TGAC.

BBSRC logo

About Aegilops tauschii

Aegilops tauschii (goatgrass) is the diploid progenitor of the bread wheat D-genome, providing important evolutionary information for wheat. The bread wheat genome is a hexaploid, resulting from the hybridization of the wild A. tauschii with a cultivated tetraploid wheat, Triticum turgidum. This spontaneous event occured about 8,000 years ago in the Fertile Crescent.


The genome of Aegilops tauschii accession AL8/78 was sequenced by the BGI using a whole-genome shotgum strategy, and assembled using SOAPdenovo software. The genome assembly achieved contigs with a N50 size of 4.51 kbp. Using paired-end information, and additional Roche/454 long-read sequences, the draft assembly was 4.23 Gbp, with a scaffold N50 length of 57.6 kbp.

The chloroplast genome component and its gene annotation are also present. This was imported from ENA entry, JQ754651.


34,498 protein-coding genes were predicted, using FGENESH and GeneID, supplemented with evidence-based information using RNA-Seq and ESTs sequences. For more details about genome sequencing and gene prediction see [1].

Non coding RNA genes have been annotated using tRNAScan-SE (Lowe, T.M. and Eddy, S.R. 1997), RFAM (Griffiths-Jones et al 2005), and RNAmmer (Lagesen K.,et al 2007); additional analysis tools have also been applied.

Triticeae Repeats from TREP were aligned to the A. tauschii genome using RepeatMasker.

Regulation and sequence alignments

RNA-Seq data, ESTs and UniGene datasets have also been aligned to the Aegilops tauschii genome:

Analysis of the bread wheat genome using comparative whole genome shotgun sequencing - Brenchley et al. [5]

The wheat genome assemblies previously generated by Brenchley et al. (PMID:23192148) have been aligned to the bread wheat survey sequence, Brachypodium, barley and the wild wheat progenitors (Triticum urartu and Aegilops tauschii). Homoeologous variants inferred between the three wheat genomes (A, B, and D) are displayed in the context of the gene models of these five genomes.

Sequences of diploid progenitor and ancestral species permitted homoeologous variants to be classified into two groups, 1) SNPs that differ between the A and D genomes (where the B genome is unknown) and, 2) SNPs that are the same between the A and D genomes, but differ in B.

The wheat gene alignments and the projected wheat SNPs are available on the Location view, as additional tracks under the "Wheat SNPs and alignments" section of the "Configure This page" menu. Click here for example.


Links (Aegilops tauschii)

  • GigaDB
  • ENA study: SRP002455: Discovery of SNPs and genome-specific mutations by comparative analysis of transcriptomes of hexaploid wheat and its diploid ancestors
  • ENA study: DRP000562: RNASeq from seedling leaves of Aegilops tauschii
  • TREP, the Triticeae Repeat Sequence Database

Links (Triticum aestivum)

  • MIPS Wheat Genome Database
  • ENA study ERP000319: 454 pyrosequencing of the Triticum aestivum (bread wheat) genome to 5X coverage
  • Triticum aestivum UniGene cluster sequences at NCBI
  • Triticum aestivum ESTs at ENA


  1. Aegilops tauschii draft genome sequence reveals a gene repertoire for wheat adaptation.
    Jia J, Zhao S, Kong X, Li Y, Zhao G, He W, Appels R, Pfeifer M, Tao Y, Zhang X et al. 2013. Nature. 496:91-95.
  2. Discovery of high-confidence single nucleotide polymorphisms from large-scale de novo analysis of leaf transcripts of Aegilops tauschii, a wild wheat progenitor.
    Iehisa JC, Shimizu A, Sato K, Nasuda S, Takumi S. 2012. DNA Res.. 19:487-497.
  3. Image credit: Mark Nesbitt [CC-BY-SA-3.0 ( or GFDL (], via Wikimedia Commons.
  4. Homoeolog-specific transcriptional bias in allopolyploid wheat.
    Akhunova AR, Matniyazov RT, Liang H, Akhunov ED. 2010. BMC Genomics. 11:505.

More information

General information about this species can be found in Wikipedia.



AssemblyASM34733v1, INSDC Assembly GCA_000347335.1, Apr 2013
Database version94.1
Base Pairs2,691,777,557
Golden Path Length3,313,764,331
Genebuild byBGI
Genebuild methodImported from ENA
Data sourceBeijing Genomics Institute

Gene counts

Coding genes33,929
Non coding genes2,219
Small non coding genes2,004
Long non coding genes215
Gene transcripts36,148

About this species