Brassica rapa R-o-18 Assembly and Gene Annotation
About Brassica rapa
Brassica rapa is widely cultivated as leafy and root vegetables as well as an oilseed and condiment, with different subspecies having distinct morphotypes producing turnips (roots), bok and pak choi, choi sum, Chinese cabbage, broccoletto, turnip, oilseed rape, as well as field and yellow mustard. This genome corresponds to the diploid (AA, 2n=20) genotype R-o-18, an inbred line of the mustard crop yellow sarson (B. rapa subsp. trilocularis), which has been used to produce a TILLING population for reverse genetics studies [1].
Assembly
Scaffold Assembly DNA was extracted from leaves of the homozygous inbred line Brassica rapa subsp. trilocularis line R-o-18 (Biosample SAMN16250067). Sequencing data were generated from the following: four paired-end (PE) libraries with various insert sizes (255, 255, 250, 150 bps) sequenced on the Illumina MiSeq platform; one PE library and three mate-pair libraries (MP, with 3kb, 5kb and 10kb insert sizes) sequenced on the Illumina HiSeq platform; and PacBio Sequel Sequencing Plate v1.2.1 sequencing chemistry used with the Instrument Control Software version 4.0.0.189873. All sequencing reads are deposited in NCBI SRA and listed under NCBI Bioproject PRJNA649364. The reads were assembled using MaSuRCA v 3.2.2. Read error-correction was not carried out as MaSuRCA does the error correction internally. The initial scaffold-scale assembly was further improved using the SSPace scaffolder with information from the MP libraries.
Chromosome-scale assembly A high-density genetic linkage map based on GBS marker segregation in the BraRCRI recombinant inbred population derived from an F1 of R-o-18 x Chiifu-401 was used to assign the assembled scaffolds and contigs to specific chromosomes. Sequences of 12,500 markers on the linkage map were aligned against the assembled scaffolds and contigs using BLASTN. Finally, the ALLMAPS software was used to anchor the scaffolds and contigs to the 10 B. rapa A genome chromosomes with identity and orientation confirmed by comparison to existing published genomes that include minor corrections as described in [2]. The final assembly contains 10 chromosomes and 295 scaffolds.
Annotation
Protein coding genes were annotated from the chromosome-scale assembly using the MAKER gene annotation pipeline. Transcript evidence (117,894 transcripts) was generated by de novo assembly of RNASeq reads from 2 transcriptome libraries from seeds of one plant obtained 35 days after pollination and 3 libraries from young leaves from 3 different plants (all R-o-18) using the Trinity pipeline. A reference protein set was generated by combining the annotated proteins of B. rapa Chiifu-401 (RefSeq GCF_000309985.2), B. napus (Refseq GCF_000686985.2), and B. oleracea (Refseq GCF_000695525.1). The MAKER-generated gene models were further filtered using NCBI Genome Workbench (v3.5.0).
Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: * 200408 Red features, covering 127 Mb (36.8% of the genome); * 771011 Low complexity (Dust) features, covering 30 Mb (8.6% of the genome); * 235992 RepeatMasker features (with the nrTEplants library), covering 146 Mb (42.2% of the genome); * 243643 Tandem repeats (TRF) features, covering 43 Mb (12.5% of the genome).
- A rich TILLING resource for studying gene function in Brassica rapa.
Stephenson P, Baker D, Girin T, Perez A, Amoah S, King GJ, Østergaard L.. - Genome structural evolution in Brassica crops.
He Z, Ji R, Havlickova L, Wang L, Li Y, Lee HT, Song J, Koh C, Yang J, Zhang M, Parkin IAP, Wang X, Edwards D, King GJ, Zou J, Liu K, Snowdon RJ, Banga SS, Machackova I, Bancroft I..
Image credir: Graham King, Southern Cross University
Statistics
Summary
Assembly | SCU_BraROA_2.3, INSDC Assembly GCA_017639395.1, Mar 2021 |
Database version | 113.1 |
Golden Path Length | 346,506,900 |
Genebuild by | ARRAY(0x8224878) |
Genebuild method | External annotation import |
Data source | Brassica rapa R-o-18 genome sequencing consortium |
Gene counts
Coding genes | 43,129 |
Gene transcripts | 43,129 |