Brassica rapa Assembly and Gene Annotation
About Brassica rapa
Brassica rapa is a widely cultivated leaf and root vegetable, with different strains producing turnips (roots), bok choi, choi sum, chinese cabbage, field mustard and napa cabbage.
Assembly
The genome was sequenced as a contribution to the Multinational Brassica Genome Sequencing Project and was published in August 2011. The genomic sequence within this version of Ensembl includes 193 large scaffolds assembled by CAAS-IVF, which have been orientated and assigned to pseudochromosomes using publicly available genetic markers.
Annotation
Gene prediction of the assembled genomic scaffolds has been conducted by CAAS-IVF using GLEAN and BLAT. Functional annotation for the gene models is provided through similarity to Arabidopsis thaliana genes (E=1e^-5^) and Gene Ontology terms are provided through significant similarity to UniProtKB proteins (E=1e^-5^).
Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 655,980 Low complexity (Dust) features, covering 33 Mb (11.8% of the genome); 209,256 RepeatMasker features (with the nrplants library), covering 88 Mb (31.1% of the genome); 118,356 RepeatMasker features (with the RepBase library), covering 32 Mb (11.4% of the genome); 145,117 Tandem repeats (TRF) features, covering 14 Mb (4.8% of the genome).
Sequence alignments
Further annotations generated by RRes are displayed as additional tracks:
- Arabidopsis coding sequences aligned using BLAT:
- Alignment parameters: minmatch(2), minscore(30), min identity(80), maxGap(2), evalue threshold(1e^-5^).
- Dataset: 33,410 Arabidopsis TAIR v9 coding sequences.
- External links: AtEnsembl transcripts.
- A 95k Brassica
UniGene set generated
by JCVI aligned using BLAT:
- Alignment parameters: default BLAT (minmatch(2), minscore(30), min identity(90), maxGap(2), evalue threshold(1e^-20^)).
- Dataset: 94,558 Brassica UniGenes.
- A 135k Brassica
UniGene
set generated by RRes aligned using BLAT:
- Alignment parameters: default BLAT (minmatch(2), minscore(30), min identity(90), maxGap(2), evalue threshold(1e^-20^)).
- Dataset: 135 201 Brassica UniGenes.
- B. rapa BAC end sequences aligned using Decypher tera-blastn:
- Alignment parameters: match_score(1), mismatch_score(-3), open_penalty(-5), extend_penalty(-2), gapped_alignment(banded), query_filtered, max_score(10), max alignment number(10), evalue threshold(1e^-50^), word_size (9), query_increment(3), extension_threshold(20), percent identity(95).
- Dataset: 196,837 B. rapa BAC end sequences obtained from GenBank 5-Aug-2010.
- External links: GenBank.
- B. rapa ESTs aligned using Decypher tera-blastn:
- Alignment parameters: match_score(1), mismatch_score(-1), open_penalty(-1), extend_penalty(-2), gapped_alignment(banded), query_filtered, max_score(10), max alignment number(10), evalue threshold(1e^-20^), word_size(9), query_increment(3), extension_threshold(20), percent identity(90).
- Dataset: 902,700 Brassica ESTs obtained from GenBank 13-Aug-2010.
- External links: GenBank.
References
- The genome of the mesopolyploid crop species Brassica
rapa.
Wang X, Wang H, Wang J, Sun R, Wu J, Liu S, Bai Y, Mun JH, Bancroft I, Cheng F et al. 2011. Nat. Genet.. 43:1035-1039.
Picture credit: School Division, Houghton Mifflin Company
More information
General information about this species can be found in Wikipedia.
Statistics
Summary
Assembly | Brapa_1.0, INSDC Assembly GCA_000309985.1, Nov 2012 |
Database version | 113.1 |
Golden Path Length | 283,822,783 |
Genebuild by | IVFCAAS |
Genebuild method | Import |
Data source | Brassica rapa Genome Sequencing Project |
Gene counts
Coding genes | 41,018 |
Non coding genes | 1,168 |
Small non coding genes | 1,149 |
Long non coding genes | 19 |
Gene transcripts | 42,193 |