Brachypodium distachyon Assembly and Gene Annotation

About Brachypodium distachyon

Brachypodium distachyon, like Arabidopsis thaliana, has several features that recommend it as a model plant for functional genomic studies, especially in the grasses. It has a small, diploid genome (~355 Mb), small physical size, a short life-cycle and few growth requirements. Brachypodium is related to the major cereal grain species but is understood to be more closely related to the Triticeae (wheat and barley) than to the other cereals.

Assembly

This release represents the second improved Brachypodium distachyon (Bd21 strain) genome including ~270 Mb of improved Brachypodium sequence, from JGI. These regions were improved by dividing the gene space into ~2Mb overlapping pieces. Each region was manually inspected and then finished using a variety of technologies including Sanger (primer walks on subclones and fosmid templates, transposon sequencing on subclone templates), Illumina (small insert shatter libraries) and clone-based shotgun sequencing using both Sanger and Illumina libraries. 1,496 gaps were closed, and a total of 1.43 Mb was added to the assembly. Overall contiguity (contig N50) increased by a factor of 63 from 347.8 kb to 22 Mb.

Annotation

74,756 transcript assemblies were constructed from 160 million paired-end Illumina RNA-seq reads, 17,647 transcript assemblies from ~1.9 million 454 reads. The transcript assemblies from RNA-seq reads were made using PERTRAN. 76,209 transcript assemblies were constructed using PASA from 314,866 sequences in total, consisting of the RNA-seq transcript assemblies above, as well as Sanger ESTs. Loci were determined by transcript assembly alignments and/or EXONERATE alignments of proteins from arabidopsis (Arabidopsis thaliana), rice, sorghum, foxtail, grape, soybean and Swiss-Prot eukaryote proteins to soft-repeatmasked Brachypodium distachyon Bd21 genome using RepeatMasker with up to 2 kb extension on both ends unless extending into another locus on the same strand. Gene models were predicted by homology-based predictors, FGENESH+, FGENESH_EST (similar to FGENESH+, EST as splice site and intron input instead of protein/translated ORF), and GenomeScan.

The end result was 34,310 loci containing protein-coding transcripts and 52,972 protein-coding transcripts

Sequence alignments

Brachypodium sylvaticum transcriptome

De novo gene models from the RNA-seq analysis of three Brachypodium sylvaticum populations were mapped to the B. distachyon reference genome. Assembled data is available from the Jaiswal lab and raw reads are available from INSDC project PRJNA182761.

Triticum aestivum transcriptome

Wheat RNA-Seq, EST and UniGene datasets have been aligned to the Brachypodium distachyon genome:

  • 454 Wheat RNA-seq data, from the study, ERP001415, were aligned using GMAP.
  • All publicly available Wheat EST data were aligned using Exonerate, following the standard Ensembl pipeline.
  • Wheat UniGene cluster sequence data were aligned using Exonerate, following the standard Ensembl pipeline.

Variation

Brachypodium variation data

Approximately 394,000 genetic variants have been identified by the alignment of transcriptome assemblies from three slender false brome (Brachypodium sylvaticum) populations. Two populations come from B. sylvaticum's native range (Greece and Spain) and one comes from its invasive range (Oregon). Both the transciptome alignments and variation data are available in Ensembl Plants.

Wheat inter-homoeologous variants

As part of the wheat genome analysis, we have aligned a set of Triticum aestivum (bread wheat) homoeologous SNPs (SNPs between the component A, B, and D genomes of wheat) against the Brachypodium distachyon genome. SNPs have been classified into two groups, 1) SNPs that differ between the A and D genomes (where the B genome is unknown) and, 2) SNPs that are the same between the A and D genomes, but differ in B.

The wheat sequence alignments and the projected homoeologous SNPs are available as tracks under the "Wheat SNPs and alignments" section of the "Configure this page" menu.

Links

Links (Brachypodium distachyon)

Links (Triticum aestivum)

  • MIPS Wheat Genome Database
  • ENA study ERP000319: 454 pyrosequencing of the Triticum aestivum (bread wheat) genome to 5X coverage
  • ENA study ERP001415: 454 sequencing of Triticum aestivum (bread wheat) cv. Chinese spring cDNA samples from a pool of tissues, from plants under drought stress and from circadian-sampled leaves
  • Triticum aestivum ESTs at ENA
  • Triticum aestivum UniGene cluster sequences at NCBI

References

  1. Genome sequencing and analysis of the model grass Brachypodium distachyon.
    The International Brachypodium Initiative. 2010. Nature. 463:763-768.
  2. Sequencing and De Novo Transcriptome Assembly of Brachypodium sylvaticum (Poaceae).
    Samuel E. Fox, Justin Preece, Jeffrey A. Kimbrel, Gina L. Marchini, Abigail Sage, Ken Youens-Clark, Mitchell B. Cruzan, and Pankaj Jaiswal. 2013. Applications in Plant Sciences. 1(3):1200011.

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyBrachypodium_distachyon_v3.0, INSDC Assembly GCA_000005505.4, Feb 2018
Database version99.4
Base Pairs270,739,461
Golden Path Length271,163,419
Genebuild byJGI
Genebuild methodImport
Data sourceJoint Genome Institute

Gene counts

Coding genes34,310
Non coding genes815
Small non coding genes784
Long non coding genes31
Gene transcripts53,787

Other

Short Variants327,200

About this species