Vigna radiata Assembly and Gene Annotation

About Vigna radiata

Vigna radiata [L.] R. Wilczek (mungbean or green gram) is an ancient legume crop that was domesticated in India some 3,5 million years ago. Mungbean is a versatile crop that only takes 60-65 days to harvest and one of the major edible pulse crops of India, China other countries in South and South East-Asia. It is also cultivated and eaten in Southern Europe, the Southern USA and in semi-arid countries in Africa e.g. Kenia. The mature seeds provide an invaluable source of digestible protein, fibre, B vitamins and minerals, particularly iron, potassium, magnesium and zinc for humans (including infant supplements) in places where meat is lacking or where the population is mostly vegetarian. Mungbean is not only grown for seeds but also as forage (fodder for cattle). Mungbean is a self-pollinated diploid (2n = 2x = 22) plant with the estimated genome size of 494 to 579 Mbp depending on the analysed genotype [1, 2].

Assembly

The pure line VC1973A of Vigna radiata var. radiata was used for genome sequencing. Paired-end 180-bp and 500-bp libraries, and mate-pair 5-, 10- and 40-kbp libraries were prepared and sequenced by Illumina HiSeq2000. These libraries provided a physical coverage of 320-fold of the estimated genome size. In addition, long reads providing ~ 5-fold genome coverage were produced by sequencing using GS FLX+. The Illumina reads were assembled using ALLPATHS-LG, producing 2,800 scaffolds with an N50 length of 1,507 kbp. The total length of the scaffolds was ~ 431 Mbp. The GS FLX+ reads were assembled using Newbler 2.5.3 into 180,372 contigs. A total of 144,213 of the GS FLX+ contigs were consistent with the scaffolds from ALLPATHS-LG. The non-matched GS FLX+ contigs were divided into 5-kbp pseudo-mate-pair reads and assembled using ALLPATHS-LG to improve the quality of the assembly, resulting in 2,748 scaffolds with an N50 length of 1.52 Mbp. The total length of the produced scaffolds was 431 Mbp, representing 80% of the genome size of 543 Mbp estimated from 25-bases long k-mer frequency distribution [1].

Annotation

The Vigna radiata genome gene prediction was implemented using the MAKER pipeline. Transcriptomes of leaf, flower, pod and root tissues, were sequenced by Illumina Hiseq2000 and assembled using Trinity. De novo transcriptome assemblies were pooled, and the redundant sequences were removed using CD-HIT. For the gene prediction pipeline, the transcriptome assembly of V. radiata, the protein sequences of Glycine max (soybean), and the complete protein sequences of Arabidopsis thaliana from UniProt were used. Once an initial prediction was made by the MAKER pipeline, its output results were used for training AUGUSTUS model parameters for the accuracy of gene predictions. Using the trained model parameters of V. radiata, the prediction pipeline was re-run against the repeat-masked V. radiata genomic scaffolds. A set of the resulting high-confident genes was annotated using InterProScan 5. In total, 22,427 genes were identified with high confidence, and 18,378 genes were located on 11 whole-chromosome pseudomolecules [1].

Links

References

  1. Genome sequence of mungbean and insights into evolution within Vigna species.
    Kang YJ, Kim SK, Kim MY, Lestari P, Kim KH, Ha BK, Jun TH, Hwang WJ, Lee T, Lee J et al. 2014. Nature Communications. 5:5443.
  2. Genomic and transcriptomic comparison of nucleotide variations for insights into bruchid resistance of mungbean (Vigna radiata [L.] R. Wilczek).
    Liu MS, Kuo TC, Ko CY, Wu DC, Li KY, Lin WJ, Lin CP, Wang YW, Schafleitner R, Lo HF et al. 2016. BMC Plant Biology. 16:46.

Picture credit: CC0 Creative Commons, No attribution required

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyVradiata_ver6, INSDC Assembly GCA_000741045.2, Nov 2015
Database version94.2
Base Pairs463,085,359
Golden Path Length463,085,359
Genebuild by05
Genebuild methodImported from ENA
Data sourceSeoul National University

Gene counts

Coding genes26,973
Non coding genes2,645
Small non coding genes1,354
Long non coding genes1,291
Pseudogenes998
Gene transcripts49,916

About this species