Aegilops tauschii Assembly and Gene Annotation
About Aegilops tauschii
Tausch's goatgrass or rough-spike hard grass (Aegilops tauschii) is found in temperate Asia and the Caucasus region. It is the diploid progenitor of the bread wheat (Triticum aestivum) D-genome, providing important evolutionary information for wheat. The bread wheat genome is a hexaploid, resulting from the hybridisation of the wild A. tauschii with a cultivated tetraploid wheat, Triticum dicoccoides. This spontaneous event occured about 8,000 years ago in the Fertile Crescent.
Assembly
The Ae. tauschii AL8/78 genome sequence was assembled in five steps. The core was assembly Aet v1.1 based on sequences of 42,822 bacterial artificial chromosome (BAC) clones. This assembly was merged with a whole-genome shotgun (WGS) assembly (Aet WGS 1.0) and WGS Pacific Biosciences mega-reads6 to extend scaffolds and close gaps, thereby producing assembly Aet v2.0. Misassembled scaffolds were detected with the aid of an AL8/78 optical BioNano genome (BNG) map and resolved, producing assembly Aet v3.0. Two additional BNG maps were constructed and, along with the genetic and physical maps7, used in super-scaffolding and building pseudomolecules for the final assembly, Aet v4.0. [1]
Annotation
83,117 genes were annotated in Aet v4.0 and 39,622 were allocated into the high-confidence class (HCC) (gene set v2.0). The remaining 43,495 were allocated into the low-confidence class (LCC). Of the HCC genes, 38,775 were in the pseudomolecules and 847 (2.2%) were in unassigned scaffolds. The total length of predicted HCC genes was 316,517,346 bp (7.5%) and the total length of their mRNAs was 145,062,217 bp (3.4%). Gene annotation was validated by a search for 1,440 BUSCO genes19, of which 1,408 (97.8%) were correctly predicted among the 83,117 genes. Ae. tauschii genes were compared with genes annotated in four grass genomes and the Arabidopsis thaliana genome. Ae. tauschii genes were the longest, had the longest mean exon length, and together with barley genes had the longest transcript lengths among the genomes. Otherwise, they were similar to genes in the other genomes, except for having a lower average number of exons.[1]
References
- Genome sequence of the progenitor of the wheat D genome Aegilops
tauschii.
Ming-Cheng Luo, Yong Q. Gu, Daniela Puiu, Hao Wang, Sven O. Twardziok, Karin R. Deal, Naxin Huo, Tingting Zhu, Le Wang, Yi Wang et al. 2017. Nature. 551:498502.
Picture credit: Wikimedia Commons, the free media repository
More information
General information about this species can be found in Wikipedia.
Statistics
Summary
Assembly | Aet v4.0, INSDC Assembly GCA_002575655.1, Oct 2017 |
Database version | 113.3 |
Golden Path Length | 4,224,915,394 |
Genebuild by | UCD |
Genebuild method | Import |
Data source | University of California, Davis |
Gene counts
Coding genes | 39,630 |
Non coding genes | 3,732 |
Small non coding genes | 3,535 |
Long non coding genes | 197 |
Gene transcripts | 262,643 |