Cajanus cajan (pigeon pea) - GCA_000340665.1 Assembly and Gene Annotation

About Cajunus cajan

Pigeonpea (Cajunus cajan) is an important legume food crop grown primarily by smallholder farmers in many semi-arid tropical regions of the world. Pigeonpea is grown on ∼5 million hectares, making it the sixth most important legume food crop globally. Domesticated >3,500 years ago in India, it is the main protein source for more than a billion people in the developing world and a cash crop that supports the livelihoods of millions of resource-poor farmers in Asia, Africa, South America, Central America and the Caribbean.

Assembly

Illumina GA and HiSeq 2000 Sequencing system were used to sequence 11 small-insert (180--800 bp) and 11 large-insert (2--20 kb) libraries [1]. This generated a total of 237.2 Gb of paired-end reads, ranging from 50--100 bp. Filtering and correction of the sequence data for very small and/or bad-quality sequences yielded 130.7 Gb of high-quality sequence, ∼163.4× coverage of the pigeonpea genome. Analysis of sequence data for GC content indicated a similar GC content distribution in the genomes of pigeonpea and soybean. Additionally, a set of 88,860 bacterial artificial chromosome (BAC) end sequences were generated using Sanger sequencing from two BAC libraries (69,120 clones) by using the HindIII (34,560 clones) and BamHI (34,560 clones) restriction enzymes.

SOAPdenovo was used to assemble 605.78 Mb of the pigeonpea genome de novo, generating a sequence with a contig N50 of 21.95 kb, and longest contig length of 185.39 kb. This was then improved by using both the paired-BAC end sequences (41,302) that passed after filtering through RepeatMasker, and a genetic map comprising 833 marker loci. This increased N50 to 516.06 kb (longest scaffold in chromosome level of 48.97 Mb) . The draft genome assembly has <5.69% (~34 Mb) unclosed gaps. These analyses showed that mapped genetic loci provide additional information for assembling superscaffolds, especially in regions in which scaffolds were not large enough to cross the repeat rich regions. The generated chromosome-scale scaffolds can be considered as 'pseudomolecules'. The estimated pigeonpea genome size, based on K-mer statistics, is 833.07 Mb, suggesting that the assembly captures 72.7% of the genome in the genome scaffolds. If only the 6,534 scaffolds >2 kb are considered, the assembly spans 578 Mb with an N50 of 0.58 Mb.

Annotation

A combination of de novo gene prediction programs and homology-based methods were used to predict gene models in the pigeonpea genome. These were combined using the GLEAN algorithm, resulting in the identification of 48,680 genes with an average transcript length of 2,348.70 bp, coding sequence size of 959.35 bp and 3.59 exons per gene. The majority of these predicted genes (99.6%) were supported either by de novo gene prediction, expressed sequence tags (EST)/unigenes or homology-based searching, or a combination of these approaches. Of these genes Ensembl Plants display 2212 genes which were imported from ENA (European Nucleotide Archive).

References

Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers.
Rajeev K Varshney, Wenbin Chen, Yupeng Li, Arvind K Bharti, Rachit K Saxena, Jessica A Schlueter, Mark T A Donoghue, Sarwar Azam, Guangyi Fan, Adam M Whaley et al. 2012. Nature Biotechnology. 30:83-89.

Statistics

Summary

Assembly	C.cajan_V1.0, INSDC Assembly GCA_000340665.1, Sep 2016
Database version	114.1
Golden Path Length	592,816,859
Genebuild by	IIPG
Genebuild method	External annotation import
Data source	BGI

Gene counts

Gene/transcipt that contains an open reading frame (ORF).Coding genes	48,654
A transcript is the operational unit of a gene. In a genomic context, transcripts consist of one or more exons, with adjoining exons being separated by introns. The exons/introns are transcribed and then the introns spliced out. Transcripts may or may not encode a proteinGene transcripts	48,654

Cajanus cajan (pigeon pea) - GCA_000340665.1 Assembly and Gene Annotation

About Cajunus cajan

Assembly

Annotation

References

Statistics

Summary

Gene counts

About Us

Get help

Our sister sites

Follow us

Favourite species

All species

Cajanus cajan (pigeon pea) - GCA_000340665.1 Assembly and Gene Annotation

About Cajunus cajan

Assembly

Annotation

References

Statistics

Summary

Gene counts

About Us

Get help

Our sister sites

Follow us