Brassica juncea Assembly and Gene Annotation
About Brassica juncea
Brassica juncea (L.) Czern & Coss is a diverse and important agricultural species. An allotetraploid (AABB, 2n = 36), B. juncea derived from interspecific hybridization between the diploid progenitors Brassica rapa (AA, 2n = 20) and Brassica nigra (BB, 2n = 16)2. Four subspecies have been proposed based on crop use and morphology: juncea (seed mustard), integrifolia (leaf mustard), napiformis (root mustard) and tumida (stem mustard). B. juncea has a wide geographic range as native plants, adapted crops and introduced weeds, spanning the continents of Asia, Europe, Africa, America and Australia. B. juncea is an important oilseed crop in India, Bangladesh, China and Ukraine, and is recently also gaining importance in Canada and Australia. Meanwhile, it is grown as a condiment in Europe, North America, Argentina and China. Root mustard is distributed in Mongolia and northeastern China, whereas leaf mustards are most common in China and Southeast Asia. Brassica juncea is regarded as one of the earliest domesticated plants, with mustard mentioned as a condiment in Sanskrit and Sumerian texts from as early as 3,000 BC.
Assembly
For de novo assembly of the SY genome, four sequencing and assembly technologies were integrated: PacBio long-read sequencing, Illumina short-read sequencing, BioNano optical mapping and Hi-C data. The SY genome size was estimated to be 1056.53 Mb by k-mer analysis, close to the 1,068 Mb estimated by flow cytometry. PacBio reads (~93×) were first assembled using FALCON, followed by contig correction using Illumina reads (~130×) to generate a V.1 assembly. Using 202-fold coverage of BioNano data, an optical consensus map was generated, which was implemented to assemble 1,897 super-scaffolds with an N50 of 5.87 Mb (assembly V.2). These contigs were categorized and ordered into 18 chromosome-scale scaffolds using a 15,543-marker high-density linkage map. Finally, Hi-C data was used to confirm the pseudo-chromosomes and manually adjusted 165 mis-joined contigs by Juicebox. The final SY assembly captured 933.5 Mb of genome sequence, with 867.3 Mb (~92.9%) anchored into chromosomes, which is superior to previous assemblies of stem and Indian mustard in terms of genome size, contiguity and anchorage.
Annotation
Among 92,878 predicted gene models, 95.5% were functionally annotated in public databases. Alignment to known proteins and expression in at least one tissue type showed 82,723 gene models were high-confidence (HC) genes, with an average coding sequence length of ~1.13 kb and an average of five exons per gene, similarly to predictions in other Brassica genomes (Supplementary Table 13). A total of 5,756 genes (6.96% of the HC genes) encoded putative transcription factors belonging to 58 different families.
- Genomic insights into the origin, domestication and diversification of Brassica juncea.
Kang L, Qian L, Zheng M, Chen L, Chen H, Yang L, You L, Yang B, Yan M, Gu Y, Wang T, Schiessl SV, An H, Blischak P, Liu X, Lu H, Zhang D, Rao Y, Jia D, Zhou D, Xiao H, Wang Y, Xiong X, Mason AS, Chris Pires J, Snowdon RJ, Hua W, Liu Z.. Nat Genet 53 (9)
Picture credit: Wikipedia
Statistics
Summary
Assembly | ASM1870372v1, INSDC Assembly GCA_018703725.1, Jun 2021 |
Database version | 113.1 |
Golden Path Length | 933,495,403 |
Genebuild by | ARRAY(0x42b1310) |
Genebuild method | External annotation import |
Data source | HUNAU |
Gene counts
Coding genes | 92,887 |
Gene transcripts | 92,887 |