Brachypodium distachyon Assembly and Gene Annotation

About Brachypodium distachyon

Brachypodium distachyon, like Arabidopsis thaliana, has several features that recommend it as a model plant for functional genomic studies, especially in the grasses. It has a small, diploid genome (~355 Mbp), small physical size, a short life-cycle and few growth requirements. Brachypodium is related to the major cereal grain species but is understood to be more closely related to the Triticeae (wheat and barley) than to the other cereals.

Assembly

This release represents the second improved Brachypodium distachyon (Bd21) genome including ~270 Mb of improved Brachypodium sequence. These regions were improved by dividing the gene space into ~2Mb overlapping pieces. Each region was manually inspected and then finished using a variety of technologies including Sanger (primer walks on subclones and fosmid templates, transposon sequencing on subclone templates), Illumina (small insert shatter libraries) and clone-based shotgun sequencing using both Sanger and Illumina libraries. 1,496 gaps were closed, and a total of 1.43 MB of base pairs was added to the assembly. Overall contiguity (contig N50) increased by a factor of 63 from 347.8Kb to 22 Mb.[1]

Annotation

74,756 transcript assemblies were constructed from 160M paired-end Illumina RNA-seq reads, 17,647 transcript assemblies from ~1.9M 454 reads. The transcript assemblies from RNA-seq reads were made using PERTRAN. 76,209 transcript assemblies were constructed using PASA from 314,866 sequences in total, consisting of the RNA-seq transcript assemblies above, as well as Sanger ESTs. Loci were determined by transcript assembly alignments and/or EXONERATE alignments of proteins from arabidopsis (Arabidopsis thaliana), rice, sorghum, foxtail, grape, soybean and Swiss-Prot eukaryote proteins to soft-repeatmasked Brachypodium distachyon Bd21 genome using RepeatMasker with up to 2K BP extension on both ends unless extending into another locus on the same strand. Gene models were predicted by homology-based predictors, FGENESH+, FGENESH_EST (similar to FGENESH+, EST as splice site and intron input instead of protein/translated ORF), and GenomeScan.

The end result was 34,310 loci containing protein-coding transcripts and 52,972 protein-coding transcripts

Sequence alignments

Brachypodium sylvaticum transcriptome

De novo gene models from the RNA-Seq analysis of three Brachypodium sylvaticum populations [2] were mapped to the B. distachyon reference genome. Click here for example. Assembled data is available from the Jaiswal lab and raw reads are available from INSDC project PRJNA182761.

Triticum aestivum transcriptome

Wheat RNA-Seq, EST and UniGene datasets have been aligned to the Brachypodium distachyon genome:

Variation

Brachypodium variation data

Approximately 394,000 genetic variations have been identified by the alignment of transcriptome assemblies from three slender false brome (Brachypodium sylvaticum) populations [2]. Two populations come from B. sylvaticum's native range (Greece and Spain) and one comes from its invasive range (Oregon). Both the transciptome alignments and variation data are available in Ensembl Plants. Click here for example.

Wheat inter-homoeologous variants

As part of the wheat genome analysis, we have aligned a set of Triticum aestivum (bread wheat) homoeologous SNPs (SNPs between the component A, B, and D genomes of wheat) against the Brachypodium distachyon genome. SNPs have been classified into two groups, 1) SNPs that differ between the A and D genomes (where the B genome is unknown) and, 2) SNPs that are the same between the A and D genomes, but differ in B [3].

The wheat sequence alignments and the projected homoeologous SNPs are available as tracks under the "Wheat SNPs and alignments" section of the "Configure This page" menu. Click here for example.

Links

Links (Brachypodium distachyon)

Links (Triticum aestivum)

  • MIPS Wheat Genome Database
  • ENA study ERP000319: 454 pyrosequencing of the Triticum aestivum (bread wheat) genome to 5X coverage
  • ENA study ERP001415: 454 sequencing of Triticum aestivum (bread wheat) cv. Chinese spring cDNA samples from a pool of tissues, from plants under drought stress and from circadian-sampled leaves
  • Triticum aestivum ESTs at ENA
  • Triticum aestivum UniGene cluster sequences at NCBI

References

  1. Genome sequencing and analysis of the model grass Brachypodium distachyon.
    The International Brachypodium Initiative. 2010. Nature. 463:763-768.
  2. Sequencing and De Novo Transcriptome Assembly of Brachypodium sylvaticum (Poaceae).
    Samuel E. Fox, Justin Preece, Jeffrey A. Kimbrel, Gina L. Marchini, Abigail Sage, Ken Youens-Clark, Mitchell B. Cruzan, and Pankaj Jaiswal. 2013. Applications in Plant Sciences. 1(3):1200011.

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyBrachypodium_distachyon_v3.0, INSDC Assembly GCA_000005505.4,
Database version93.4
Base Pairs270,739,461
Golden Path Length271,163,419
Genebuild by23
Genebuild methodGenerated from ENA annotation
Data sourceEuropean Nucleotide Archive

Gene counts

Coding genes34,310
Gene transcripts52,972

Other

Short Variants327,200

About this species