Hordeum vulgare (MorexV3_pseudomolecules_assembly)

Hordeum vulgare Assembly and Gene Annotation

About Hordeum vulgare

Hordeum vulgare (barley) is the world's fourth most important cereal crop and an important model for ecological adaptation, having been cultivated in all temperate regions from the Arctic Circle to the tropics. It was one of the first domesticated cereal grains originating in the Fertile Crescent over 10,000 years ago. About two-thirds of the global barley crop is used for animal feed, while the remaining third underpins the malting, brewing, and distilling industries. Although the human diet is not a primary use, barley offers potential health benefits, and is still the major calorie source in several parts of the world.

Barley is a diploid member of the grass family (2n=14), making it a natural model for the genetics and genomics of the Triticeae tribe, including polyploid wheat and rye. With a haploid genome size of ~5.3 Gb in seven chromosomes, it is one of the largest diploid genomes sequenced to date.

This is the assembly of cultivar Morex, a six-row malting variety.

Assembly

Construction of MorexV3 pseudomolecules proceeded in several steps: 1) Canu assembly of PacBio circular consensus reads (27x coverage); 2) scaffolding the Canu assembly with Bionano contigs; 3) removal of small redundant sequences; 4) filling gaps in scaffolds with ONT_smartdenovo contigs (resulted in 439 contigs); 5) ordering and orienting scaffolds into chromosomal pseudomolecules with TRITEX using Hi-C data. The order and orientation of distal sequence in MorexV3, validated by genetic and optical maps, was greatly improved compared with V2.

Annotation

Gene models were annotated on the MorexV3 pseudomolecules using the same transcriptomic resources as used for MorexV2/TRITEX, but with an improved version of the PGSB annotation pipeline, which is also able to call isoforms and UTRs. A total of 81,687 genes with 83,990 transcripts were identified. Of these, 35,827 were classified as high-confidence (HC) genes. Among all gene models, 98.6% of BUSCO models were retrieved. Moreover, 91% of V3 gene models had no ambiguous bases in their 100 kb flanking sequence compared to only 0.7% in MorexV2. The coding sequences of 35,260 (98.4%) Morex V3 HC gene models had near-complete alignments (≥95% alignment coverage, ≥99% identity) to the V2 pseudomolecules.

Moreover, models corresponding to the barley gene reference transcript dataset (BaRTv1.0) are also supported. These were derived from the analysis of 22 RNA-seq experiments covering 843 separate samples.

Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 3431114 Low complexity (Dust) features, covering 158 Mb (3.7% of the genome); 1681147 RepeatMasker features (with the nrTEplants library), covering 3212 Mb (76.0% of the genome); 2100078 Tandem repeats (TRF) features, covering 325 Mb (7.7% of the genome); Repeat Detector repeat length: 3422Mb (80.9% of the genome).

Regulation

Mappings for probes from the Barley1 GeneChip array, the Agilent barley full-length cDNA array, and the barley PGRC1 10k A and B array set have been mapped to barley genes and loci.

Variation

The following variation datasets were remapped from assembly IBSC_v2 to MorexV3:

  1. Variation data from WGS survey sequencing of four cultivars, Barke, Bowman, Igri, Haruna Nijo and a wild barley (H. spontaneum)].
  2. SNPs discovered from RNA-seq performed on the embryo tissues of nine spring barley varieties (Barke, Betzes, Bowman, Derkado, Intro, Optic, Quench, Sergeant and Tocada) and Morex using Illumina HiSeq 2000.
  3. Approximately five million variations from population sequencing of 90 Morex x Barke individuals.
  4. Approximately six million variations from population sequencing of 84 Oregon Wolfe barley individuals.
  5. SNPs from the Illumina iSelect 9k barley SNP chip. ~2,600 mapped genetic markers associated with these SNPs are also displayed.

References

  1. Long-read sequence assembly: a technical evaluation in barley.
    Mascher M, Wicker T, Jenkins J, Plott C, Lux T, Koh CS, Ens J, Gundlach H, Boston LB, Tulpová Z, Holden S, Hernández-Pinzón I, Scholz U, Mayer KFX, Spannagl M, Pozniak CJ, Sharpe AG, Šimková H, Moscou MJ, Grimwood J, Schmutz J, Stein N..
  2. EORNA, a barley gene and transcript abundance database.
    Milne L, Bayer M, Rapazote-Flores P, Mayer CD, Waugh R, Simpson CG..
  3. TRITEX: chromosome-scale sequence assembly of Triticeae genomes with open-source tools.
    Monat C, Padmarasu S, Lux T, Wicker T, Gundlach H, Himmelbach A, Ens J, Li C, Muehlbauer GJ, Schulman AH, et al. 2019. Genome Biol 20 : 284
  4. A chromosome conformation capture ordered sequence of the barley genome.
    Mascher M, Gundlach H, Himmelbach A, Beier S, Twardziok SO, Wicker T, Radchuk V, Dockter C, Hedley PE, Russell J et al. 2017. Nature. 544:427-433.
  5. Comprehensive mapping of long-range interactions reveals folding principles of the human genome.
    Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO et al. 2009. Science. 326:289-93.
  6. A physical, genetic and functional sequence assembly of the barley genome.
    International Barley Genome Sequencing Consortium, Mayer KF, Waugh R, Brown JW, Schulman A, Langridge P, Platzer M, Fincher GB, Muehlbauer GJ, Sato K et al. 2012. Nature. 491:711-716.
  7. A sequence-ready physical map of barley anchored genetically by two million single-nucleotide polymorphisms.
    Ariyadasa R, Mascher M, Nussbaumer T, Schulte D, Frenkel Z, Poursarebani N, Zhou R, Steuernagel B, Gundlach H, Taudien S et al. 2014. Plant Physiol.. 164
  8. Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ).
    Mascher M, Muehlbauer GJ, Rokhsar DS, Chapman J, Schmutz J, Barry K, Muoz-Amatrian M, Close TJ, Wise RP, Schulman AH et al. 2013. Plant J.. 76:718-727.

Picture credit: Lucash (Own work) [CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons

Statistics

Summary

AssemblyMorexV3_pseudomolecules_assembly, INSDC Assembly GCA_904849725.1, Apr 2021
Database version111.4
Golden Path Length4,225,577,519
Genebuild byIPK
Genebuild methodExternal annotation import
Data sourceLeibniz Institute of Plant Genetics and Crop Plant Research

Gene counts

Coding genes35,826
Gene transcripts37,962

Other

Short Variants13,474,890