Triticum dicoccoides (WEWSeq_v.1.0)

Triticum dicoccoides Assembly and Gene Annotation

About Triticum dicoccoides

Emmer wheat or hulled wheat is a type of awned wheat. Emmer is a tetraploid (2n = 4x = 28 chromosomes).The domesticated types are Triticum turgidum subsp. dicoccum and Triticum turgidum conv. durum. The wild plant is called Triticum turgidum subsp. dicoccoides. The principal difference between the wild and the domestic is that the ripened seed head of the wild plant shatters and scatters the seed onto the ground, while in the domesticated emmer the seed head remains intact, thus making it easier for humans to harvest the grain. Along with einkorn wheat, emmer was one of the first crops domesticated in the Near East. It was widely cultivated in the ancient world, but is now a relict crop in mountainous regions of Europe and Asia. Emmer is considered a type of farro food especially in Italy.

Assembly

Wild emmer accession "Zavitan" was chosen for this genome assembly to leverage the genetic data already collected for this line by the WEWseq Consortium. The WEW reference genome, constructed by whole-genome shotgun (WGS) sequencing of various insert-size libraries, produced contigs with an N50 of 57,378 base pairs (bp) and scaffolds with an N50 of 6,955,166 bp. The scaffolds were validated with genetic data and combined with three-dimensional (3D) chromosome conformation capture sequencing (HiC) data, enabling construction of chromosome-scale assemblies (pseudomolecules). The resulting 10.5-Gb genome assembly is composed of 14 pseudomolecule sequences representing the 14 chromosomes of WEW (10.1 Gb) and one group of unassigned scaffolds (0.4 Gb). The gaps between scaffolds, estimated to represent ~1.5 Gb of the genome, are likely the result of technically difficult-to-sequence or difficult-to-assemble regions.

Annotation

Gene annotation was carried out by the WEWseq Consortium. RNA sequencing reads generated from 20 different combinations of WEW tissues and developmental stages were used to annotate protein-coding genes in the WEW assembly (13). 65,012 high-confidence (HC) gene models were identified, and validation with the BUSCO gene set indicated that the assembly captured 98.4% of the WEW gene complement. 45,532 Low confidence genes where discovered as well and shown on a seperate track.

References

  1. Emmer.
    Wikipedia.
  2. Wild emmer genome architecture and diversity elucidate wheat evolution and domestication.
    Raz Avni, Moran Nave, Omer Barad, Kobi Baruch, Sven O. Twardziok, Heidrun Gundlach, Iago Hale, Martin Mascher, Manuel Spannagl, Krystalee Wiebe et al. 2017. Science. 357:93-97.

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyWEWSeq v.1.0, INSDC Assembly GCA_002162155.1, Jun 2017
Database version113.1
Golden Path Length10,079,039,394
Genebuild byWEWSeq
Genebuild methodImport
Data sourceWEWseq consortium

Gene counts

Coding genes62,569
Non coding genes4,731
Small non coding genes4,513
Long non coding genes218
Pseudogenes75
Gene transcripts300,710