Manihot esculenta (M.esculenta_v8)

Manihot esculenta Assembly and Gene Annotation

About Manihot esculenta

Cassava (Manihot esculenta Crantz) is grown throughout tropical Africa, Asia and the Americas for its starchy storage roots, and feeds an estimated 750 million people each day. Farmers choose it for its high productivity and its ability to withstand a variety of environmental conditions (including significant water stress) in which other crops fail. However, it has low protein content, and is susceptible to a range of biotic stresses. Despite these problems, the crop production potential for cassava is enormous, and its capacity to grow in a variety of environmental conditions makes it a plant of the future for emerging tropical nations. Cassava is also an excellent energy source - its roots contain 20-40% starch that costs 15-30% less to produce per hectare than starch from corn, making it an attractive and strategic source of renewable energy [2].

Assembly

This assembly was created by the DOE-Joint Genome Institute and was obtained by a whole genome shotgun (WGS) strategy, using 454 Life Sciences technology. The cassava genome assembled into 12,977 scaffolds span a total of 532.5 Mb.

Annotation

To produce the current Cassava V6.1 gene set, the homology-based gene prediction programs FgenesH and GenomeScan were used, along with the PASA program to integrate expression information from cassava ESTs and RNA-Seq.

Transcript data from three sources were integrated. First, RNA-seq root and shoot tissues from Albert and Namikonga varieties, with and without challenge by CBSV 1x50 (1,055,722,008 initial reads, 895,271,180 reads after quality trimming) and 2x100 (340,899,946 initial reads; 282,586,400 reads after quality trimming) reads were aligned to the genome and assembled with phytozome in-house software Pertran. This yielded 51,588 and 62,488 transcript assemblies from PE and SE reads respectively. These were aligned to the genome with PASA (90% identity and 60% coverage cutoffs) to make 69,624 aligned assemblies. In addition, ESTs from previous 454 sequencing were assembled with Pertran and added to 80,459 ESTs from GenBank and aligned to the genome with PASA (95% identity, 60% coverage) to generate 27,470 aligned assembles.

Repeated sequences were called with the Repeat Detector, which is part of the Ensembl Genomes repeat feature pipelines. Repeats length: 245898519 - Repeats content: 42.2%

References

  1. The Cassava Genome: Current Progress, Future Directions.
    Simon Prochnik, Pradeep Reddy Marri, Brian Desany, Pablo D. Rabinowicz, Chinnappa Kodira, Mohammed Mohiuddin, Fausto Rodriguez, Claude Fauquet, Joseph Tohme, Timothy Harkins et al. 2012. Tropical Plant Biology. 5:88-94.
  2. Phytozome: a comparative platform for green plant genomics.
    David M. Goodstein, Shengqiang Shu, Russell Howson, Rochak Neupane, Richard D. Hayes, Joni Fazo, Therese Mitros, William Dirks, Uffe Hellsten, Nicholas Putnam and Daniel S. Rokhsar. 2012. Nucleic Acids Research. 40

Statistics

Summary

AssemblyM.esculenta_v8, INSDC Assembly GCA_001659605.2,
Database version113.2
Golden Path Length639,586,700
Genebuild byJGI
Genebuild methodImport
Data sourceJGI

Gene counts

Coding genes32,805
Gene transcripts59,151