Manihot esculenta Assembly and Gene Annotation
About Manihot esculenta
Cassava (Manihot esculenta Crantz) is grown throughout tropical Africa, Asia and the Americas for its starchy storage roots, and feeds an estimated 750 million people each day. Farmers choose it for its high productivity and its ability to withstand a variety of environmental conditions (including significant water stress) in which other crops fail. However, it has low protein content, and is susceptible to a range of biotic stresses. Despite these problems, the crop production potential for cassava is enormous, and its capacity to grow in a variety of environmental conditions makes it a plant of the future for emerging tropical nations. Cassava is also an excellent energy source - its roots contain 20-40% starch that costs 15-30% less to produce per hectare than starch from corn, making it an attractive and strategic source of renewable energy [2].
Assembly
This assembly was created by the DOE-Joint Genome Institute and was obtained by a whole genome shotgun (WGS) strategy, using 454 Life Sciences technology. The cassava genome assembled into 12,977 scaffolds span a total of 532.5 Mb.
Annotation
To produce the current Cassava V6.1 gene set, the homology-based gene prediction programs FgenesH and GenomeScan were used, along with the PASA program to integrate expression information from cassava ESTs and RNA-Seq.
Transcript data from three sources were integrated. First, RNA-seq root and shoot tissues from Albert and Namikonga varieties, with and without challenge by CBSV 1x50 (1,055,722,008 initial reads, 895,271,180 reads after quality trimming) and 2x100 (340,899,946 initial reads; 282,586,400 reads after quality trimming) reads were aligned to the genome and assembled with phytozome in-house software Pertran. This yielded 51,588 and 62,488 transcript assemblies from PE and SE reads respectively. These were aligned to the genome with PASA (90% identity and 60% coverage cutoffs) to make 69,624 aligned assemblies. In addition, ESTs from previous 454 sequencing were assembled with Pertran and added to 80,459 ESTs from GenBank and aligned to the genome with PASA (95% identity, 60% coverage) to generate 27,470 aligned assembles.
Repeated sequences were called with the Repeat Detector, which is part of the Ensembl Genomes repeat feature pipelines. Repeats length: 245898519 - Repeats content: 42.2%
References
- The Cassava Genome: Current Progress, Future
Directions.
Simon Prochnik, Pradeep Reddy Marri, Brian Desany, Pablo D. Rabinowicz, Chinnappa Kodira, Mohammed Mohiuddin, Fausto Rodriguez, Claude Fauquet, Joseph Tohme, Timothy Harkins et al. 2012. Tropical Plant Biology. 5:88-94. - Phytozome: a comparative platform for green plant
genomics.
David M. Goodstein, Shengqiang Shu, Russell Howson, Rochak Neupane, Richard D. Hayes, Joni Fazo, Therese Mitros, William Dirks, Uffe Hellsten, Nicholas Putnam and Daniel S. Rokhsar. 2012. Nucleic Acids Research. 40
Statistics
Summary
Assembly | M.esculenta_v8, INSDC Assembly GCA_001659605.2, |
Database version | 113.2 |
Golden Path Length | 639,586,700 |
Genebuild by | JGI |
Genebuild method | Import |
Data source | JGI |
Gene counts
Coding genes | 32,805 |
Gene transcripts | 59,151 |