Brassica juncea Assembly and Gene Annotation

About Brassica juncea

Brassica juncea (L.) Czern & Coss is a diverse and important agricultural species. An allotetraploid (AABB, 2n = 36), B. juncea derived from interspecific hybridization between the diploid progenitors Brassica rapa (AA, 2n = 20) and Brassica nigra (BB, 2n = 16)2. Four subspecies have been proposed based on crop use and morphology: juncea (seed mustard), integrifolia (leaf mustard), napiformis (root mustard) and tumida (stem mustard). B. juncea has a wide geographic range as native plants, adapted crops and introduced weeds, spanning the continents of Asia, Europe, Africa, America and Australia. B. juncea is an important oilseed crop in India, Bangladesh, China and Ukraine, and is recently also gaining importance in Canada and Australia. Meanwhile, it is grown as a condiment in Europe, North America, Argentina and China. Root mustard is distributed in Mongolia and northeastern China, whereas leaf mustards are most common in China and Southeast Asia. Brassica juncea is regarded as one of the earliest domesticated plants, with mustard mentioned as a condiment in Sanskrit and Sumerian texts from as early as 3,000 BC.

Assembly

For de novo assembly of the SY genome, four sequencing and assembly technologies were integrated: PacBio long-read sequencing, Illumina short-read sequencing, BioNano optical mapping and Hi-C data. The SY genome size was estimated to be 1056.53 Mb by k-mer analysis, close to the 1,068 Mb estimated by flow cytometry. PacBio reads (~93×) were first assembled using FALCON, followed by contig correction using Illumina reads (~130×) to generate a V.1 assembly. Using 202-fold coverage of BioNano data, an optical consensus map was generated, which was implemented to assemble 1,897 super-scaffolds with an N50 of 5.87 Mb (assembly V.2). These contigs were categorized and ordered into 18 chromosome-scale scaffolds using a 15,543-marker high-density linkage map. Finally, Hi-C data was used to confirm the pseudo-chromosomes and manually adjusted 165 mis-joined contigs by Juicebox. The final SY assembly captured 933.5 Mb of genome sequence, with 867.3 Mb (~92.9%) anchored into chromosomes, which is superior to previous assemblies of stem and Indian mustard in terms of genome size, contiguity and anchorage.

Annotation

Among 92,878 predicted gene models, 95.5% were functionally annotated in public databases. Alignment to known proteins and expression in at least one tissue type showed 82,723 gene models were high-confidence (HC) genes, with an average coding sequence length of ~1.13 kb and an average of five exons per gene, similarly to predictions in other Brassica genomes (Supplementary Table 13). A total of 5,756 genes (6.96% of the HC genes) encoded putative transcription factors belonging to 58 different families.

Genomic insights into the origin, domestication and diversification of Brassica juncea.
Kang L, Qian L, Zheng M, Chen L, Chen H, Yang L, You L, Yang B, Yan M, Gu Y, Wang T, Schiessl SV, An H, Blischak P, Liu X, Lu H, Zhang D, Rao Y, Jia D, Zhou D, Xiao H, Wang Y, Xiong X, Mason AS, Chris Pires J, Snowdon RJ, Hua W, Liu Z.. Nat Genet 53 (9)

Picture credit: Wikipedia

Statistics

Summary

Assembly	ASM1870372v1, INSDC Assembly GCA_018703725.1, Jun 2021
Database version	111.1
Golden Path Length	933,495,403
Genebuild by	ARRAY(0x2b39f80)
Genebuild method	External annotation import
Data source	HUNAU

Gene counts

Gene/transcipt that contains an open reading frame (ORF).Coding genes	92,887
A transcript is the operational unit of a gene. In a genomic context, transcripts consist of one or more exons, with adjoining exons being separated by introns. The exons/introns are transcribed and then the introns spliced out. Transcripts may or may not encode a proteinGene transcripts	92,887

Brassica juncea Assembly and Gene Annotation

About Brassica juncea

Assembly

Annotation

Statistics

Summary

Gene counts

About Us

Get help

Our sister sites

Follow us

Favourite species

All species

Brassica juncea Assembly and Gene Annotation

About Brassica juncea

Assembly

Annotation

Statistics

Summary

Gene counts

About Us

Get help

Our sister sites

Follow us