Arabis alpina (A_alpina_V4)

Arabis alpina Assembly and Gene Annotation

About Arabis alpina

Arabis alpina, the Alpine rock-cress (2n=16), is a close relative of Arabidopsis thaliana belonging to the family Brassicaceae. It is a perennial model organism. Since it is self-compatible and genetically transformable, mutagenized populations and inbred homozygous lines can be produced. A. alpina belongs to the tribe Arabideae, in which several perennial and annual species can be found.

The reference accession Pajares was collected in the Cordillera Cantábrica mountain system in Spain and was afterwards self-fertilized for 6 generations by single-seed descent. Its haploid genome was sequenced following a hybrid approach using 454 shotgun and 454 and Illumina paired-end sequencing with 12 kb, 3 kb and 500 bp insert size libraries. In addition, 21Mb of Sanger sequenced BAC ends were produced. The genome size estimated by flow cytometry was 375 Mb. A 309 Mb assembly was produced with 38,819 scaffolds, N50 of 788Kb and L50 of 160Kb. Comparative chromosome painting (CCP) revealed conserved order of large genomic regions with A. thaliana enabling an additional, synteny-based scaffolding into eight pseudo-molecules containing 88% of the genes.

Gene annotations were performed using the evidence based ab initio gene-prediction tools Augustus and EuGen. In total 135 Gb of RNA seq reads from various tissues and developmental stages were used. The final annotation consisted of 30,729 protein-coding genes of which 514 genes were curated in a manual annotation jamboree held in 2012. Overall, 85% (26,109) of the genes had similarity to one or more genes in A. thaliana, as computed with BLASTP. For 92% (23,924) of the conserved genes there was expression evidence in seedlings, whereas 67% (3,096) of the remaining genes, without similarity to A. thaliana genes, were also detected as expressed.

Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 519,436 Low complexity (Dust) features, covering 42 Mb (13.6% of the genome); 209,470 RepeatMasker features (with the nrTEplants library), covering 144 Mb (46.8% of the genome); 158,266 RepeatMasker features (with the REdat library), covering 58 Mb (19.0% of the genome); 150,514 Tandem repeats (TRF) features, covering 14 Mb (4.5% of the genome).

References

  1. Genome expansion of Arabis alpina linked with retrotransposition and reduced symmetric DNA methylation.
    Willing EM, Rawat V, Mandáková T, Maumus F, James GV, Nordström KJ, Becker C, Warthmann N, Chica C, Szarzynska B, Zytnicki M, Albani MC, Kiefer C, Bergonzi S, Castaings L, Mateos JL, Berns MC, Bujdoso N, Piofczyk T, de Lorenzo L, Barrero-Sicilia C, Mateos I, Piednoël M, Hagmann J, Chen-Min-Tao R, Iglesias-Fernández R, Schuster SC, Alonso-Blanco C, Roudier F, Carbonero P, Paz-Ares J, Davis SJ, Pecinka A, Quesneville H, Colot V, Lysak MA, Weigel D, Coupland G, Schneeberger K..

Picture credit: http://www.arabis-alpina.org

arabis-alpina.org

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyA_alpina_V4, INSDC Assembly GCA_000733195.1,
Database version111.1
Golden Path Length308,032,609
Genebuild byTRANSNET
Genebuild methodImport
Data sourceTRANSNET

Gene counts

Coding genes21,609
Non coding genes2,400
Small non coding genes2,397
Long non coding genes3
Gene transcripts25,686