Arabidopsis thaliana Assembly and Gene Annotation

About Arabidopsis thaliana

Arabidopsis thaliana is a small flowering plant that is widely used as a model organism in plant biology. Arabidopsis is a member of the mustard (Brassicaceae) family, which includes cultivated species such as cabbage and radish. Arabidopsis is not of major agronomic significance, but it offers important advantages for basic research in genetics and molecular biology. Arabidopsis thaliana has a genome size of ~135 Mbp, and a haploid chromosome number of 5.

Assembly

The complete genome sequence of Arabidopsis thaliana was first published by the Arabidopsis Genome Initiative in 2000 [1] and was determined by a BAC-by-BAC sequencing strategy anchored to chromosomes using a variety of genetic and physical maps.

Annotation

This browser is based on data from Araport11 gene annotation, a comprehensive reannotation of the Col-0 genome released June, 2016. This annotation annotation was constructed using 113 public RNA-seq data sets along with annotation contributions from NCBI, UniProt, and laboratories conducting Arabidopsis thaliana research. Details of the structural and functional annotation steps to generate the Araport11 protein-coding gene set as well as consolidation and annotation of non-coding RNAs are described in this draft manuscript on bioRxiv

Regulation

Mappings for probes from the following expression arrays have been added:

Variation

The Arabidopsis variation database was updated in release 36 (June 2017) to the latest 1001 variation data set, covering more than 10 million variant loci across 1,135 samples (Cell 2016). The phenotypic data from a GWAS study of 107 phenotypes in 95 inbred lines carried out by Atwell et al. [5] has been retained.

In this major update, the following arabidopsis variation datasets were obsoleted (but are still available in the Ensembl Plants Archive site):

Links

References

  1. Araport11: a complete reannotation of the Arabidopsis thaliana reference genome.
    CHIA-YI Cheng, Vivek Krishnakumar, Agnes Chan, Seth Schobel, Christopher D. Town. 2016. bioRxiv.
  2. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.
    Arabidopsis Genome Initiative. 2000. Nature. 408:796-815.
  3. The Arabidopsis Information Resource (TAIR): gene structure and function annotation.
    Swarbreck D, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, Foerster H, Li D, Meyer T, Muller R, Ploetz L et al. 2008. Nucleic Acids Res.. 36:D1009-14.
  4. Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel.
    Horton MW, Hancock AM, Huang YS, Toomajian C, Atwell S, Auton A, Muliyati NW, Platt A, Sperone FG, Vilhjlmsson BJ et al. 2012. Nat. Genet.. 44:212-216.
  5. Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana.
    Clark RM, Schweikert G, Toomajian C, Ossowski S, Zeller G, Shinn P, Warthmann N, Hu TT, Fu G, Hinds DA et al. 2007. Science. 317:338-342.
  6. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines.
    Atwell S, Huang YS, Vilhjlmsson BJ, Willems G, Horton M, Li Y, Meng D, Platt A, Tarone AM, Hu TT et al. 2010. Nature. 465:627-631.

Picture credit: By Emmanuel Boutet (Own work) [GFDL, CC-BY-SA-3.0 or CC BY-SA 2.5-2.0-1.0], via Wikimedia Commons

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyTAIR10, INSDC Assembly GCA_000001735.1, Sep 2010
Database version89.11
Base Pairs135,670,229
Golden Path Length119,667,750
Genebuild byTAIR
Genebuild methodImported from TAIR
Data sourceTAIR

Gene counts

Coding genes27,655
Non coding genes1,398
Small non coding genes1,398
Gene transcripts54,013

Other

FGENESH gene prediction20,579
Short Variants14,234,197
Structural variants13,667

About this species