Brassica rapa (Brapa_1.0)

Brassica rapa Assembly and Gene Annotation

The Brassica rapa genome browser has been developed through a joint effort by the Ensembl Genomes group and Rothamsted Research. From release 13 of Ensembl Genomes, the EBI will be maintaining the genome browser for B. rapa in the context of Ensembl Plants.

Rothamsted Research Acknowledges:

  • BBSRC for funding (grant number BB/E017797/1)
  • Ian Bancroft and Martin Trick at the John Innes Centre for providing the gene annotation.
  • Nick James and Sean May (NASC) for their assistance in establishing BrassEnsembl.

About Brassica rapa

Brassica rapa is a widely cultivated leaf and root vegetable, with different strains producing turnips (roots), bok choi, choi sum, chinese cabbage, field mustard and napa cabbage.

Assembly

The genome was sequenced as a contribution to the Multinational Brassica Genome Sequencing Project and was published in August 2011. The genomic sequence within this version of Ensembl includes 193 large scaffolds assembled by CAAS-IVF, which have been orientated and assigned to pseudochromosomes using publicly available genetic markers.

Annotation

Gene prediction of the assembled genomic scaffolds has been conducted by CAAS-IVF using GLEAN and BLAT. Functional annotation for the gene models is provided through similarity to Arabidopsis thaliana genes (E=1e^-5^) and Gene Ontology terms are provided through significant similarity to UniProtKB proteins (E=1e^-5^).

Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 655,980 Low complexity (Dust) features, covering 33 Mb (11.8% of the genome); 209,256 RepeatMasker features (with the nrplants library), covering 88 Mb (31.1% of the genome); 118,356 RepeatMasker features (with the RepBase library), covering 32 Mb (11.4% of the genome); 145,117 Tandem repeats (TRF) features, covering 14 Mb (4.8% of the genome).

Sequence alignments

Further annotations generated by RRes are displayed as additional tracks:

  • Arabidopsis coding sequences aligned using BLAT:
    • Alignment parameters: minmatch(2), minscore(30), min identity(80), maxGap(2), evalue threshold(1e^-5^).
    • Dataset: 33,410 Arabidopsis TAIR v9 coding sequences.
    • External links: AtEnsembl transcripts.
  • A 95k Brassica UniGene set generated by JCVI aligned using BLAT:
    • Alignment parameters: default BLAT (minmatch(2), minscore(30), min identity(90), maxGap(2), evalue threshold(1e^-20^)).
    • Dataset: 94,558 Brassica UniGenes.
  • A 135k Brassica UniGene set generated by RRes aligned using BLAT:
    • Alignment parameters: default BLAT (minmatch(2), minscore(30), min identity(90), maxGap(2), evalue threshold(1e^-20^)).
    • Dataset: 135 201 Brassica UniGenes.
  • B. rapa BAC end sequences aligned using Decypher tera-blastn:
    • Alignment parameters: match_score(1), mismatch_score(-3), open_penalty(-5), extend_penalty(-2), gapped_alignment(banded), query_filtered, max_score(10), max alignment number(10), evalue threshold(1e^-50^), word_size (9), query_increment(3), extension_threshold(20), percent identity(95).
    • Dataset: 196,837 B. rapa BAC end sequences obtained from GenBank 5-Aug-2010.
    • External links: GenBank.
  • B. rapa ESTs aligned using Decypher tera-blastn:
    • Alignment parameters: match_score(1), mismatch_score(-1), open_penalty(-1), extend_penalty(-2), gapped_alignment(banded), query_filtered, max_score(10), max alignment number(10), evalue threshold(1e^-20^), word_size(9), query_increment(3), extension_threshold(20), percent identity(90).
    • Dataset: 902,700 Brassica ESTs obtained from GenBank 13-Aug-2010.
    • External links: GenBank.

References

  1. The genome of the mesopolyploid crop species Brassica rapa.
    Wang X, Wang H, Wang J, Sun R, Wu J, Liu S, Bai Y, Mun JH, Bancroft I, Cheng F et al. 2011. Nat. Genet.. 43:1035-1039.

Picture credit: School Division, Houghton Mifflin Company

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyBrapa_1.0, INSDC Assembly GCA_000309985.1, Nov 2012
Database version111.1
Golden Path Length283,822,783
Genebuild byIVFCAAS
Genebuild methodImport
Data sourceBrassica rapa Genome Sequencing Project

Gene counts

Coding genes41,018
Non coding genes1,168
Small non coding genes1,149
Long non coding genes19
Gene transcripts42,193