Hordeum vulgare Assembly and Gene Annotation

The barley genome assembly in Ensembl Plants is version 082214v1. This is a gene-space assembly of Hordeum vulgare cv. Morex generated by the International Barley Genome Sequencing Consortium, with contigs ordered into chromosomal pseudomolecules using POPSEQ data from Ariyadasa et al. and Mascher et al. The gene models are provided by the PGSB group at the German Research Centre for Environmental Health. The data presented in Ensembl Plants has been organised as part of the U.K. Barley Genome Sequencing Project, funded by the Biotechnology and Biological Sciences Research Council.

IBSC logo PGSB logo BBSRC logo UK Barley Genome Sequencing logoIPK logo

About Hordeum vulgare

Hordeum vulgare (barley) is the world's fourth most important cereal crop and an important model for ecological adaptation, having been cultivated in all temperate regions from the Arctic Circle to the tropics. It was one of the first domesticated cereal grains originating in the Fertile Crescent over 10,000 years ago. About two-thirds of the global barley crop is used for animal feed, while the remaining third underpins the malting, brewing, and distilling industries. Although the human diet is not a primary use, barley offers potential health benefits, and is still the major calorie source in several parts of the world. Barley is a diploid member of the grass family, making it a natural model for the genetics and genomics of the Triticeae tribe, including polyploid wheat and rye. With a haploid genome size of ~5.3 Gbp in 7 chromosomes, it is one of the largest diploid genomes sequenced to date.

Assembly

The barley genome assembly presented here was produced by the International Barley Genome Sequencing Consortium (IBSC) [1], expanded and refined using population sequencing (POPSEQ) data generated by the IPK [2,3]. The structure of the genome assembly differs significantly from conventional assemblies. In place of conventional scaffolding, sequence contigs have been anchored to the genome using multiple lines of evidence, including alignment to a FPC physical map, identification of genetic markers, and syntenic stratification. As the assembly has good coverage of the genic regions, it has been termed a "gene-ome", a fairly complete gene set, much of which is located and ordered. Full details of the IBSC assembly process can be found in [1], and details of the IPK POPSEQ anchoring process in [2] and [3].

In the IBSC assembly, ~2.6 million sequenced contigs were generated using whole-genome shotgun sequencing (WGS). ~723,000 of these are assigned to specific chromosomal positions. These are shown in the browser on chromosomes labeled 1-7. An additional 138,000 WGS contigs could be assigned to a specific chromosome arm, but not to a more specific location. These are shown in the browser on chromosomes labeled "_unordered" (e.g. 1H_unordered, 3HL_unordered, 3HS_unordered). "HL" and "HS" refer to chromosome long and short arms respectively. Note: the "_unordered" suffix was added in Ensembl Genomes release 26; in previous releases, these chromosomes were labeled with just an "H," e.g. "1H, 3HS, 5HL."

In the IPK POPSEQ assembly, 90 Morex x Barke individuals were sequenced at low depth, and the sequences aligned with the IBSC WGS contigs. A total of 5.1 million SNPs were called on this alignment. These SNPs were then integrated into a high-density SNP-based genetic map [3]. IBSC WGS contigs harboring these SNPs were then placed into this genetic framework.

Examination of common markers shows that the IBSC and POPSEQ anchoring strategies are highly congruent [2,3]. Thus, the assembly presented in Ensembl Genomes includes contigs anchored by both strategies, so as not to lose any sequence or gene assignments. In the current release, all WGS contigs anchored by POPSEQ are placed accordingly. Then, those WGS contigs not anchored by POPSEQ, but anchored in the IBSC assembly retain their former positions from the IBSC assembly.

In the Ensembl browser, sequence contigs (labeled morex_contig) are displayed at approximate base-pair coordinates inferred from the location of associated genetic markers in the barley genetic and physical maps.

The chloroplast genome component and its gene annotation are also present. This was imported from ENA entry, KC912687.

Annotation

79,379 genes have been called by the IBSC [1]. 26,159 of these have homology support from at least one other reference genome and are designated "high-confidence". The remaining 53,220 genes are designated "low-confidence". These two classes are displayed with separate tracks in the Ensembl browser.

15,719 high-confidence genes could be directly associated with the barley genome scaffold (BGS). To associate additional genes, the genetically anchored and physically ordered gene backbone was compared to the reference genomes of Oryza sativa, Sorghum bicolor, and Brachypodium distachyon, and extended and highly-conserved synteny regions were exploited to determine a location and order additional genes in the overall framework. A total of 3,743 extra genes were placed on chromosomes by this approach.

Regulation and sequence alignments

Regulation

Mappings for probes from the Barley1 GeneChip array, the Agilent barley full-length cDNA array, and the barley PGRC1 10k A and B array set can be viewed in the browser. For example, see the results for Contig2083_s_at.

Transcriptome assembly in diploid einkorn wheat Triticum monococcum - Fox et al. [4]

Genome-wide transcriptomes of two Triticum monococcum subspecies were constructed, the wild winter wheat T. monococcum ssp. aegilopoides (accession G3116) and the domesticated spring wheat T. monococcum ssp. monococcum (accession DV92) by generating de novo assemblies of RNA-Seq data derived from both etiolated and green seedlings. Assembled data is available from the Jaiswal lab and raw reads are available from INSDC projects PRJNA203221 and PRJNA195398.

The de novo transcriptome assemblies of DV92 and G3116 represent 120,911 and 117,969 transcripts, respectively. They were mapped to the bread wheat, barley and Triticum urartu genomes using STAR. Click here for a barley example.

Triticum aestivum transcriptome

Wheat RNA-Seq, EST and UniGene datasets have also been aligned to the Hordeum vulgare genome:

Variation

Barley variation data

Five sources of barley variation data are shown:

  1. Variation data from WGS survey sequencing of four cultivars, Barke, Bowman, Igri, Haruna Nijo and a wild barley (H. spontaneum). The data was collected as described in [1].
  2. SNPs discovered from RNA-Seq performed on the embryo tissues of 9 spring barley varieties (Barke, Betzes, Bowman, Derkado, Intro, Optic, Quench, Sergeant and Tocada) and Morex using Illumina HiSeq 2000 [1].
  3. ~5 million variations from population sequencing of 90 Morex x Barke individuals [2]
  4. ~6 million variations from population sequencing of 84 Oregon Wolfe barley individuals [2]
  5. SNPs from the Illumina iSelect 9k barley SNP chip [6]. ~2,600 mapped genetic markers associated with these SNPs [3] are also displayed.

As of release 25, all barley variations have been added to the transPLANT variation archive.

References

  1. A physical, genetic and functional sequence assembly of the barley genome.
    International Barley Genome Sequencing Consortium, Mayer KF, Waugh R, Brown JW, Schulman A, Langridge P, Platzer M, Fincher GB, Muehlbauer GJ, Sato K et al. 2012. Nature. 491:711-716.
  2. A sequence-ready physical map of barley anchored genetically by two million single-nucleotide polymorphisms.
    Ariyadasa R, Mascher M, Nussbaumer T, Schulte D, Frenkel Z, Poursarebani N, Zhou R, Steuernagel B, Gundlach H, Taudien S et al. 2014. Plant Physiol.. 164:412-423.
  3. Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ).
    Mascher M, Muehlbauer GJ, Rokhsar DS, Chapman J, Schmutz J, Barry K, Muoz-Amatrian M, Close TJ, Wise RP, Schulman AH et al. 2013. Plant J.. 76:718-727.
  4. De Novo Transcriptome Assembly and Analyses of Gene Expression during Photomorphogenesis in Diploid Wheat Triticum monococcum.
    Fox SE, Geniza M, Hanumappa M, Naithani S, Sullivan C, Preece J, Tiwari VK, Elser J, Leonard JM, Sage A et al. 2014. PLoS ONE. 9:e96855.
  5. Analysis of the bread wheat genome using whole-genome shotgun sequencing.
    Brenchley R, Spannagl M, Pfeifer M, Barker GL, D'Amore R, Allen AM, McKenzie N, Kramer M, Kerhornou A, Bolser D et al. 2012. Nature. 491:705-710.
  6. Natural variation in a homolog of Antirrhinum CENTRORADIALIS contributed to spring growth habit and environmental adaptation in cultivated barley.
    Comadran J, Kilian B, Russell J, Ramsay L, Stein N, Ganal M, Shaw P, Bayer M, Thomas W, Marshall D et al. 2012. Nat. Genet.. 44:1388-1392.

Picture credit: Lucash (Own work) [CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyASM32608v1, INSDC Assembly GCA_000326085.1, Mar 2012
Database version87.2
Base Pairs1,356,979,403
Golden Path Length4,045,300,851
Genebuild byIBSC_1.0
Genebuild methodImport
Data sourceIBSC

Gene counts

Coding genes24,287
Non coding genes1,512
Small non coding genes1,498
Long non coding genes14
Pseudogenes268
Gene transcripts64,096

Other

Short Variants18,331,939

About this species