Aegilops tauschii Assembly and Gene Annotation

Wheat genomics resources are developed as part of our involvement in the consortium Triticeae Genomics For Sustainable Agriculture, funded by the BBSRC, and led by the Earlham Institute.

BBSRC logo

About Aegilops tauschii

Tausch's goatgrass or rough-spike hard grass (Aegilops tauschii) is found in temperate Asia and the Caucasus region. It is the diploid progenitor of the bread wheat (Triticum aestivum) D-genome, providing important evolutionary information for wheat. The bread wheat genome is a hexaploid, resulting from the hybridisation of the wild A. tauschii with a cultivated tetraploid wheat, Triticum dicoccoides. This spontaneous event occured about 8,000 years ago in the Fertile Crescent.

Assembly

The Ae. tauschii AL8/78 genome sequence was assembled in five steps. The core was assembly Aet v1.1 based on sequences of 42,822 bacterial artificial chromosome (BAC) clones. This assembly was merged with a whole-genome shotgun (WGS) assembly (Aet WGS 1.0) and WGS Pacific Biosciences mega-reads6 to extend scaffolds and close gaps, thereby producing assembly Aet v2.0. Misassembled scaffolds were detected with the aid of an AL8/78 optical BioNano genome (BNG) map and resolved, producing assembly Aet v3.0. Two additional BNG maps were constructed and, along with the genetic and physical maps7, used in super-scaffolding and building pseudomolecules for the final assembly, Aet v4.0. [1]

Annotation

83,117 genes were annotated in Aet v4.0 and 39,622 were allocated into the high-confidence class (HCC) (gene set v2.0). The remaining 43,495 were allocated into the low-confidence class (LCC). Of the HCC genes, 38,775 were in the pseudomolecules and 847 (2.2%) were in unassigned scaffolds. The total length of predicted HCC genes was 316,517,346 bp (7.5%) and the total length of their mRNAs was 145,062,217 bp (3.4%). Gene annotation was validated by a search for 1,440 BUSCO genes19, of which 1,408 (97.8%) were correctly predicted among the 83,117 genes. Ae. tauschii genes were compared with genes annotated in four grass genomes and the Arabidopsis thaliana genome. Ae. tauschii genes were the longest, had the longest mean exon length, and together with barley genes had the longest transcript lengths among the genomes. Otherwise, they were similar to genes in the other genomes, except for having a lower average number of exons.[1]

References

  1. Genome sequence of the progenitor of the wheat D genome Aegilops tauschii.
    Ming-Cheng Luo, Yong Q. Gu, Daniela Puiu, Hao Wang, Sven O. Twardziok, Karin R. Deal, Naxin Huo, Tingting Zhu, Le Wang, Yi Wang et al. 2017. Nature. 551:498502.

Picture credit: Wikimedia Commons, the free media repository

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyAet v4.0, INSDC Assembly GCA_002575655.1, Oct 2017
Database version95.3
Base Pairs4,224,915,394
Golden Path Length4,224,915,394
Genebuild byUCD
Genebuild methodImported from ENA
Data sourceUniversity of California, Davis

Gene counts

Coding genes39,630
Gene transcripts258,911

About this species