Hordeum vulgare Assembly and Gene Annotation
About Hordeum vulgare
Hordeum vulgare (barley) is the world's fourth most important cereal crop and an important model for ecological adaptation, having been cultivated in all temperate regions from the Arctic Circle to the tropics. It was one of the first domesticated cereal grains originating in the Fertile Crescent over 10,000 years ago. About two-thirds of the global barley crop is used for animal feed, while the remaining third underpins the malting, brewing, and distilling industries. Although the human diet is not a primary use, barley offers potential health benefits, and is still the major calorie source in several parts of the world. Barley is a diploid member of the grass family, making it a natural model for the genetics and genomics of the Triticeae tribe, including polyploid wheat and rye. With a haploid genome size of ~5.3 Gb in seven chromosomes, it is one of the largest diploid genomes sequenced to date.
The barley genome assembly presented here was produced by the International Barley Genome Sequencing Consortium (IBSC) using a hierarchical approach. Initially multiplexed short read BAC by BAC contig assemblies (N50: 79 kb) were scaffolded using physical, genetic and optical maps (N50: 1.9 Mb) and were assigned to chromosomes using a POPSEQ genetic map. Finally, the linear order and orientation of scaffold sequences was determined using chromosome-conformation capture sequencing (Hi-C).
The final chromosome-scale assembly consisted of 6,347 ordered super-scaffolds composed of merged assemblies of individual BACs, representing 4.79 Gb (~95%) of the genomic sequence content, of which 4.54 Gb have been assigned to precise chromosomal locations in the Hi-C map.
The chloroplast genome component and its gene annotation are also present (KC912687).
Mapping of transcriptome data and reference protein sequences from other plant species identified 83,105 putative gene loci including protein coding genes, non-coding RNAs, pseudogenes and transcribed transposons. These loci were filtered and divided into 39,734 high-confidence and 41,949 low-confidence genes based on sequence homology. Additionally 19,908 long non-coding RNAs and 792 microRNA precursor loci were predicted. Using a set of conserved eukaryotic core genes (BUSCO), it was estimated that the predicted gene models represent 98% of the cv. Morex barley gene complement.
Five sources of barley variation data are shown:
- Variation data from WGS survey sequencing of four cultivars, Barke, Bowman, Igri, Haruna Nijo and a wild barley (H. spontaneum)].
- SNPs discovered from RNA-seq performed on the embryo tissues of nine spring barley varieties (Barke, Betzes, Bowman, Derkado, Intro, Optic, Quench, Sergeant and Tocada) and Morex using Illumina HiSeq 2000.
- Approximately five million variations from population sequencing of 90 Morex x Barke individuals.
- Approximately six million variations from population sequencing of 84 Oregon Wolfe barley individuals.
- SNPs from the Illumina iSelect 9k barley SNP chip. ~2,600 mapped genetic markers associated with these SNPs are also displayed.
- A chromosome conformation capture ordered sequence of the barley genome.
Mascher M, Gundlach H, Himmelbach A, Beier S, Twardziok SO, Wicker T, Radchuk V, Dockter C, Hedley PE, Russell J et al. 2017. Nature. 544:427-433.
- Comprehensive mapping of long-range interactions reveals folding principles of the human genome.
Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO et al. 2009. Science. 326:289-93.
- A physical, genetic and functional sequence assembly of the barley genome.
International Barley Genome Sequencing Consortium, Mayer KF, Waugh R, Brown JW, Schulman A, Langridge P, Platzer M, Fincher GB, Muehlbauer GJ, Sato K et al. 2012. Nature. 491:711-716.
- A sequence-ready physical map of barley anchored genetically by two million single-nucleotide polymorphisms.
Ariyadasa R, Mascher M, Nussbaumer T, Schulte D, Frenkel Z, Poursarebani N, Zhou R, Steuernagel B, Gundlach H, Taudien S et al. 2014. Plant Physiol.. 164
- Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ).
Mascher M, Muehlbauer GJ, Rokhsar DS, Chapman J, Schmutz J, Barry K, Muoz-Amatrian M, Close TJ, Wise RP, Schulman AH et al. 2013. Plant J.. 76:718-727.
General information about this species can be found in Wikipedia.
|Assembly||IBSC v2, INSDC Assembly GCA_901482405.1, Apr 2017|
|Golden Path Length||4,834,432,680|
|Data source||International Barley Genome Sequencing Consortium|
|Non coding genes||3,196|
|Small non coding genes||2,987|
|Long non coding genes||209|