Manihot esculenta Assembly and Gene Annotation

About Manihot esculenta

Cassava (Manihot esculenta Crantz) is grown throughout tropical Africa, Asia and the Americas for its starchy storage roots, and feeds an estimated 750 million people each day. Farmers choose it for its high productivity and its ability to withstand a variety of environmental conditions (including significant water stress) in which other crops fail. However, it has low protein content, and is susceptible to a range of biotic stresses. Despite these problems, the crop production potential for cassava is enormous, and its capacity to grow in a variety of environmental conditions makes it a plant of the future for emerging tropical nations. Cassava is also an excellent energy source - its roots contain 20-40% starch that costs 15-30% less to produce per hectare than starch from corn, making it an attractive and strategic source of renewable energy [2].

Assembly

This assembly was created by the DOE-Joint Genome Institute and was obtained by a whole genome shotgun (WGS) strategy, using 454 Life Sciences technology. The cassava genome assembled into 12,977 scaffolds span a total of 532.5 Mb.

Annotation

To produce the current Cassava V6.1 gene set, the homology-based gene prediction programs FgenesH and GenomeScan were used, along with the PASA program to integrate expression information from cassava ESTs and RNA-Seq.

Transcript data from three sources were integrated. First, RNA-seq root and shoot tissues from Albert and Namikonga varieties, with and without challenge by CBSV 1x50 (1,055,722,008 initial reads, 895,271,180 reads after quality trimming) and 2x100 (340,899,946 initial reads; 282,586,400 reads after quality trimming) reads were aligned to the genome and assembled with phytozome in-house software Pertran. This yielded 51,588 and 62,488 transcript assemblies from PE and SE reads respectively. These were aligned to the genome with PASA (90% identity and 60% coverage cutoffs) to make 69,624 aligned assemblies. In addition, ESTs from previous 454 sequencing were assembled with Pertran and added to 80,459 ESTs from GenBank and aligned to the genome with PASA (95% identity, 60% coverage) to generate 27,470 aligned assembles.

Repeated sequences were called with the Repeat Detector, which is part of the Ensembl Genomes repeat feature pipelines. Repeats length: 245898519 - Repeats content: 42.2%

References

The Cassava Genome: Current Progress, Future Directions.
Simon Prochnik, Pradeep Reddy Marri, Brian Desany, Pablo D. Rabinowicz, Chinnappa Kodira, Mohammed Mohiuddin, Fausto Rodriguez, Claude Fauquet, Joseph Tohme, Timothy Harkins et al. 2012. Tropical Plant Biology. 5:88-94.
Phytozome: a comparative platform for green plant genomics.
David M. Goodstein, Shengqiang Shu, Russell Howson, Rochak Neupane, Richard D. Hayes, Joni Fazo, Therese Mitros, William Dirks, Uffe Hellsten, Nicholas Putnam and Daniel S. Rokhsar. 2012. Nucleic Acids Research. 40

Statistics

Summary

Assembly	M.esculenta_v8, INSDC Assembly GCA_001659605.2,
Database version	115.2
Golden Path Length	639,586,700
Genebuild by	JGI
Genebuild method	Import
Data source	JGI

Gene counts

Gene/transcipt that contains an open reading frame (ORF).Coding genes	32,805
A transcript is the operational unit of a gene. In a genomic context, transcripts consist of one or more exons, with adjoining exons being separated by introns. The exons/introns are transcribed and then the introns spliced out. Transcripts may or may not encode a proteinGene transcripts	59,151

Manihot esculenta Assembly and Gene Annotation

About Manihot esculenta

Assembly

Annotation

References

Statistics

Summary

Gene counts

About Us

Get help

Our sister sites

Follow us

Favourite species

All species

Manihot esculenta Assembly and Gene Annotation

About Manihot esculenta

Assembly

Annotation

References

Statistics

Summary

Gene counts

About Us

Get help

Our sister sites

Follow us