Hordeum vulgare Assembly and Gene Annotation
About Hordeum vulgare
Hordeum vulgare (barley) is the world's fourth most important cereal crop and an important model for ecological adaptation, having been cultivated in all temperate regions from the Arctic Circle to the tropics. It was one of the first domesticated cereal grains originating in the Fertile Crescent over 10,000 years ago. About two-thirds of the global barley crop is used for animal feed, while the remaining third underpins the malting, brewing, and distilling industries. Although the human diet is not a primary use, barley offers potential health benefits, and is still the major calorie source in several parts of the world.
Barley is a diploid member of the grass family (2n=14), making it a natural model for the genetics and genomics of the Triticeae tribe, including polyploid wheat and rye. With a haploid genome size of ~5.3 Gb in seven chromosomes, it is one of the largest diploid genomes sequenced to date.
This is the assembly of cultivar Morex, a six-row malting variety.
Construction of MorexV3 pseudomolecules proceeded in several steps: 1) Canu assembly of PacBio circular consensus reads (27x coverage); 2) scaffolding the Canu assembly with Bionano contigs; 3) removal of small redundant sequences; 4) filling gaps in scaffolds with ONT_smartdenovo contigs (resulted in 439 contigs); 5) ordering and orienting scaffolds into chromosomal pseudomolecules with TRITEX using Hi-C data. The order and orientation of distal sequence in MorexV3, validated by genetic and optical maps, was greatly improved compared with V2.
Gene models were annotated on the MorexV3 pseudomolecules using the same transcriptomic resources as used for MorexV2/TRITEX, but with an improved version of the PGSB annotation pipeline, which is also able to call isoforms and UTRs. A total of 81,687 genes with 83,990 transcripts were identified. Of these, 35,827 were classified as high-confidence (HC) genes. Among all gene models, 98.6% of BUSCO models were retrieved. Moreover, 91% of V3 gene models had no ambiguous bases in their 100 kb flanking sequence compared to only 0.7% in MorexV2. The coding sequences of 35,260 (98.4%) Morex V3 HC gene models had near-complete alignments (≥95% alignment coverage, ≥99% identity) to the V2 pseudomolecules.
Moreover, models corresponding to the barley gene reference transcript dataset (BaRTv1.0) are also supported. These were derived from the analysis of 22 RNA-seq experiments covering 843 separate samples.
Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 3431114 Low complexity (Dust) features, covering 158 Mb (3.7% of the genome); 1681147 RepeatMasker features (with the nrTEplants library), covering 3212 Mb (76.0% of the genome); 2100078 Tandem repeats (TRF) features, covering 325 Mb (7.7% of the genome); Repeat Detector repeat length: 3422Mb (80.9% of the genome).
The following variation datasets were remapped from assembly IBSC_v2 to MorexV3:
- Variation data from WGS survey sequencing of four cultivars, Barke, Bowman, Igri, Haruna Nijo and a wild barley (H. spontaneum)].
- SNPs discovered from RNA-seq performed on the embryo tissues of nine spring barley varieties (Barke, Betzes, Bowman, Derkado, Intro, Optic, Quench, Sergeant and Tocada) and Morex using Illumina HiSeq 2000.
- Approximately five million variations from population sequencing of 90 Morex x Barke individuals.
- Approximately six million variations from population sequencing of 84 Oregon Wolfe barley individuals.
- SNPs from the Illumina iSelect 9k barley SNP chip. ~2,600 mapped genetic markers associated with these SNPs are also displayed.
- Long-read sequence assembly: a technical evaluation in barley.
Mascher M, Wicker T, Jenkins J, Plott C, Lux T, Koh CS, Ens J, Gundlach H, Boston LB, Tulpová Z, Holden S, Hernández-Pinzón I, Scholz U, Mayer KFX, Spannagl M, Pozniak CJ, Sharpe AG, Šimková H, Moscou MJ, Grimwood J, Schmutz J, Stein N..
- EORNA, a barley gene and transcript abundance database.
Milne L, Bayer M, Rapazote-Flores P, Mayer CD, Waugh R, Simpson CG..
- TRITEX: chromosome-scale sequence assembly of Triticeae genomes with
Monat C, Padmarasu S, Lux T, Wicker T, Gundlach H, Himmelbach A, Ens J, Li C, Muehlbauer GJ, Schulman AH, et al. 2019. Genome Biol 20 : 284
- A chromosome conformation capture ordered sequence of the barley
Mascher M, Gundlach H, Himmelbach A, Beier S, Twardziok SO, Wicker T, Radchuk V, Dockter C, Hedley PE, Russell J et al. 2017. Nature. 544:427-433.
- Comprehensive mapping of long-range interactions reveals folding
principles of the human
Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO et al. 2009. Science. 326:289-93.
- A physical, genetic and functional sequence assembly of the barley
International Barley Genome Sequencing Consortium, Mayer KF, Waugh R, Brown JW, Schulman A, Langridge P, Platzer M, Fincher GB, Muehlbauer GJ, Sato K et al. 2012. Nature. 491:711-716.
- A sequence-ready physical map of barley anchored genetically by two
Ariyadasa R, Mascher M, Nussbaumer T, Schulte D, Frenkel Z, Poursarebani N, Zhou R, Steuernagel B, Gundlach H, Taudien S et al. 2014. Plant Physiol.. 164
- Anchoring and ordering NGS contig assemblies by population
Mascher M, Muehlbauer GJ, Rokhsar DS, Chapman J, Schmutz J, Barry K, Muoz-Amatrian M, Close TJ, Wise RP, Schulman AH et al. 2013. Plant J.. 76:718-727.
|Assembly||MorexV3_pseudomolecules_assembly, INSDC Assembly GCA_904849725.1, Apr 2021|
|Golden Path Length||4,225,577,519|
|Genebuild method||External annotation import|
|Data source||Leibniz Institute of Plant Genetics and Crop Plant Research|