Arabis alpina Assembly and Gene Annotation
About Arabis alpina
Arabis alpina, the Alpine rock-cress (2n=16), is a close relative of Arabidopsis thaliana belonging to the family Brassicaceae. It is a perennial model organism. Since it is self-compatible and genetically transformable, mutagenized populations and inbred homozygous lines can be produced. A. alpina belongs to the tribe Arabideae, in which several perennial and annual species can be found.
The reference accession Pajares was collected in the Cordillera Cantábrica mountain system in Spain and was afterwards self-fertilized for 6 generations by single-seed descent. Its haploid genome was sequenced following a hybrid approach using 454 shotgun and 454 and Illumina paired-end sequencing with 12 kb, 3 kb and 500 bp insert size libraries. In addition, 21Mb of Sanger sequenced BAC ends were produced. The genome size estimated by flow cytometry was 375 Mb. A 309 Mb assembly was produced with 38,819 scaffolds, N50 of 788Kb and L50 of 160Kb. Comparative chromosome painting (CCP) revealed conserved order of large genomic regions with A. thaliana enabling an additional, synteny-based scaffolding into eight pseudo-molecules containing 88% of the genes.
Gene annotations were performed using the evidence based ab initio gene-prediction tools Augustus and EuGen. In total 135 Gb of RNA seq reads from various tissues and developmental stages were used. The final annotation consisted of 30,729 protein-coding genes of which 514 genes were curated in a manual annotation jamboree held in 2012. Overall, 85% (26,109) of the genes had similarity to one or more genes in A. thaliana, as computed with BLASTP. For 92% (23,924) of the conserved genes there was expression evidence in seedlings, whereas 67% (3,096) of the remaining genes, without similarity to A. thaliana genes, were also detected as expressed.
Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 519,436 Low complexity (Dust) features, covering 42 Mb (13.6% of the genome); 209,470 RepeatMasker features (with the nrTEplants library), covering 144 Mb (46.8% of the genome); 158,266 RepeatMasker features (with the REdat library), covering 58 Mb (19.0% of the genome); 150,514 Tandem repeats (TRF) features, covering 14 Mb (4.5% of the genome).
References
- Genome expansion of Arabis alpina linked with retrotransposition and reduced symmetric DNA methylation.
Willing EM, Rawat V, Mandáková T, Maumus F, James GV, Nordström KJ, Becker C, Warthmann N, Chica C, Szarzynska B, Zytnicki M, Albani MC, Kiefer C, Bergonzi S, Castaings L, Mateos JL, Berns MC, Bujdoso N, Piofczyk T, de Lorenzo L, Barrero-Sicilia C, Mateos I, Piednoël M, Hagmann J, Chen-Min-Tao R, Iglesias-Fernández R, Schuster SC, Alonso-Blanco C, Roudier F, Carbonero P, Paz-Ares J, Davis SJ, Pecinka A, Quesneville H, Colot V, Lysak MA, Weigel D, Coupland G, Schneeberger K..
Picture credit: http://www.arabis-alpina.org
Links
More information
General information about this species can be found in Wikipedia.
Statistics
Summary
Assembly | A_alpina_V4, INSDC Assembly GCA_000733195.1, |
Database version | 113.1 |
Golden Path Length | 308,032,609 |
Genebuild by | TRANSNET |
Genebuild method | Import |
Data source | TRANSNET |
Gene counts
Coding genes | 21,609 |
Non coding genes | 2,400 |
Small non coding genes | 2,397 |
Long non coding genes | 3 |
Gene transcripts | 25,686 |