Triticum dicoccoides Assembly and Gene Annotation
About Triticum dicoccoides
Emmer wheat or hulled wheat is a type of awned wheat. Emmer is a tetraploid (2n = 4x = 28 chromosomes).The domesticated types are Triticum turgidum subsp. dicoccum and Triticum turgidum conv. durum. The wild plant is called Triticum turgidum subsp. dicoccoides. The principal difference between the wild and the domestic is that the ripened seed head of the wild plant shatters and scatters the seed onto the ground, while in the domesticated emmer the seed head remains intact, thus making it easier for humans to harvest the grain. Along with einkorn wheat, emmer was one of the first crops domesticated in the Near East. It was widely cultivated in the ancient world, but is now a relict crop in mountainous regions of Europe and Asia. Emmer is considered a type of farro food especially in Italy.
Wild emmer accession "Zavitan" was chosen for this genome assembly to leverage the genetic data already collected for this line by the WEWseq Consortium. The WEW reference genome, constructed by whole-genome shotgun (WGS) sequencing of various insert-size libraries, produced contigs with an N50 of 57,378 base pairs (bp) and scaffolds with an N50 of 6,955,166 bp. The scaffolds were validated with genetic data and combined with three-dimensional (3D) chromosome conformation capture sequencing (HiC) data, enabling construction of chromosome-scale assemblies (pseudomolecules). The resulting 10.5-Gb genome assembly is composed of 14 pseudomolecule sequences representing the 14 chromosomes of WEW (10.1 Gb) and one group of unassigned scaffolds (0.4 Gb). The gaps between scaffolds, estimated to represent ~1.5 Gb of the genome, are likely the result of technically difficult-to-sequence or difficult-to-assemble regions.
Gene annotation was carried out by the WEWseq Consortium. RNA sequencing reads generated from 20 different combinations of WEW tissues and developmental stages were used to annotate protein-coding genes in the WEW assembly (13). 65,012 high-confidence (HC) gene models were identified, and validation with the BUSCO gene set indicated that the assembly captured 98.4% of the WEW gene complement. 45,532 Low confidence genes where discovered as well and shown on a seperate track.
- Wild emmer genome architecture and diversity elucidate wheat
Raz Avni, Moran Nave, Omer Barad, Kobi Baruch, Sven O. Twardziok, Heidrun Gundlach, Iago Hale, Martin Mascher, Manuel Spannagl, Krystalee Wiebe et al. 2017. Science. 357:93-97.
General information about this species can be found in Wikipedia.
|Assembly||WEWSeq v.1.0, INSDC Assembly GCA_002162155.1, Jun 2017|
|Golden Path Length||10,079,039,394|
|Data source||WEWseq consortium|
|Non coding genes||4,731|
|Small non coding genes||4,513|
|Long non coding genes||218|