Arabidopsis thaliana Assembly and Gene Annotation
About Arabidopsis thaliana
Arabidopsis thaliana is a small flowering plant that is widely used as a model organism in plant biology. Arabidopsis is a member of the mustard (Brassicaceae) family, which includes cultivated species such as cabbage and radish. Arabidopsis is not of major agronomic significance, but its small genome size and ease of cultivation offer important advantages for basic research in genetics and molecular biology. Arabidopsis thaliana has a genome size of ~135 Mb, and a haploid chromosome number of five.
Assembly
The current genome assembly of Arabidopsis thaliana is TAIR10, produced by NCBI using data provided by TAIR, based in the Col-0 ecotype. It was determined by a BAC-by-BAC sequencing strategy anchored to chromosomes using a variety of genetic and physical maps.
Annotation
This browser is based on data from Araport11 gene annotation, a comprehensive reannotation of the TAIR10 genome, released June, 2016. This annotation annotation was constructed using 113 public RNA-seq data sets along with annotation contributions from NCBI, UniProt, and laboratories conducting Arabidopsis thaliana research. Details of the structural and functional annotation steps to generate the Araport11 protein-coding gene set as well as consolidation and annotation of non-coding RNAs are described in https://doi.org/10.1111/tpj.13415.
Regulation
Mappings for probes from the following expression arrays have been added:
Variation
The Arabidopsis variation database was updated in release 36 (June 2017) to the latest 1001 variation data set, covering more than 10 million variant loci across 1,135 samples. The database also includes phenotypic data from a GWAS study of 107 phenotypes in 95 inbred lines.
References
- Araport11: a complete reannotation of the Arabidopsis thaliana
reference genome.
CHIA-YI Cheng, Vivek Krishnakumar, Agnes Chan, Seth Schobel, Christopher D. Town. 2017. Plant J. - Analysis of the genome sequence of the flowering plant Arabidopsis
thaliana.
Arabidopsis Genome Initiative. 2000. Nature. 408:796-815. - The Arabidopsis Information Resource (TAIR): gene structure and
function annotation.
Swarbreck D, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, Foerster H, Li D, Meyer T, Muller R, Ploetz L et al. 2008. Nucleic Acids Res.. 36:D1009-14. - Genome-wide patterns of genetic variation in worldwide Arabidopsis
thaliana accessions from the RegMap
panel.
Horton MW, Hancock AM, Huang YS, Toomajian C, Atwell S, Auton A, Muliyati NW, Platt A, Sperone FG, Vilhjlmsson BJ et al. 2012. Nat. Genet.. 44:212-216. - Common sequence polymorphisms shaping genetic diversity in
Arabidopsis thaliana.
Clark RM, Schweikert G, Toomajian C, Ossowski S, Zeller G, Shinn P, Warthmann N, Hu TT, Fu G, Hinds DA et al. 2007. Science. 317:338-342. - Genome-wide association study of 107 phenotypes in Arabidopsis
thaliana inbred lines.
Atwell S, Huang YS, Vilhjlmsson BJ, Willems G, Horton M, Li Y, Meng D, Platt A, Tarone AM, Hu TT et al. 2010. Nature. 465:627-631.
Picture credit: Eric Belfield (University of Oxford)
More information
General information about this species can be found in Wikipedia.
Statistics
Summary
Assembly | TAIR10, INSDC Assembly GCA_000001735.1, Apr 2008 |
Database version | 113.11 |
Golden Path Length | 119,667,750 |
Genebuild by | Araport11 |
Genebuild method | Import |
Data source | The Arabidopsis Information Resource |
Gene counts
Coding genes | 27,655 |
Non coding genes | 5,178 |
Small non coding genes | 1,697 |
Long non coding genes | 3,481 |
Gene transcripts | 54,013 |
Other
FGENESH gene prediction | 20,579 |
Short Variants | 12,883,854 |