Dioscorea rotundata Assembly and Gene Annotation
About Dioscorea rotundata
White Guinea yam (Dioscorea rotundata Poir.) is a common staple food that has contributed enormously to the subsistence and socio-cultural life of millions of people principally in West and Central Africa. Belonging to the monocotyledon the Dioscorea genus, under the family Dioscoreaceae, there are about 450 known species mainly distributed in tropical and subtropical regions of the world. The entire Genus is characterised by the occurrence of separate male and female plants (dioecy), a rare trait in angiosperms that has limited efficient breeding, and D. rotundata can be either diploid or triploid, with a haploid chromosome number of 20. The reference for D. rotundata comes from a diploid individual of unknown sex with an assembled genome size of 579 Mb.
Assembly
The genome assembly was created by the Iwate Biotechnology Research Center. The TDr96_F1 line used for whole genome sequencing was selected from F1 progeny obtained from an open-pollinated D. rotundata breeding line (TDr96/00629) grown under field conditions on the experimental fields of the International Institute of Tropical Agriculture (IITA) in Nigeria. Total DNA from fresh leaves was sequenced with a PromethION sequencer (Oxford Nanopore Technologies), representing 36.6x genome coverage of the 570 Mbp genome size estimated by flow cytometry.
The generated long reads were assembled by Flye, and the assembled contigs were polished by Pilon using Illumina short reads in the previous study (1). Then, the duplicated contigs were removed by Purge Haplotigs, and the retained contigs were polished by Pilon again. The contigs were anchored and ordered into 20 pseudo-chromosomes using a genetic map generated from the whole genome sequences of 156 F1 individuals, obtained a cross between TDr04/219 (P1: Female) and TDr97/777 (P2: Male) breeding lines. The 20 pseudo-chromosomes (named 1 to 20) represent 84.9% of the total contigs, with a size of 492 Mbp. The remaining are unanchored linkage groups and contigs.
Annotation
The Iwate Biotechnology Research Center generated gene models using TACO, which is a software for RNA-seq-based gene prediction. TACO predicted 25,708 gene models using RNA-seqs from 15 different D. rotundata tissues. In addition, the gene models predicted in the previous study (1) were mapped to the new reference genome by Spaln2, and 8,884 gene models which did overlap with the new gene models were added to the new reference genome. Open reading frames were predicted by InterProScan. These were functionally annotated using Pfam protein family database through InterProScan and BLAST+ to the database of Viridiplantae from UniProt.
Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 1,219,329 Low complexity (Dust) features, covering 50 Mb (8.6% of the genome); 261,748 RepeatMasker features (with the nrTEplants library), covering 77 Mb (13.2% of the genome); 401,558 Tandem repeats (TRF) features, covering 35 Mb (6.0% of the genome); Repeat Detector repeats length 257Mb (44.1% of the genome).
References
Genome sequencing of the staple food crop white Guinea yam enables the development of a molecular marker for sex determination.
Tamiru M, Natsume S, Takagi H, White B, Yaegashi H, Shimizu M, Yoshida K, Uemura A, Oikawa K, Abe A et al. 2017. BMC Biology, 15:86.Genome analyses reveal the hybrid origin of the staple crop white Guinea yam (Dioscorea rotundata).
Yu Sugihara, Hiroki Yaegashi, Satoshi Natsume, Motoki Shimizu, Akira Abe, Akiko Hirabuchi, Kazue Ito, Kaori Oikawa, Muluneh Tamiru-Oli, Atsushi Ohta, Ryo Matsumoto, Paterne Agre, David De Koeyer, Babil Pachakkil, Shinsuke Yamanaka, Satoru Muranaka, Hiroko Takagi, Ben White, Robert Asiedu, Hideki Innan, Asrat Asfaw, Patrick Adebola and Ryohei Terauchi- PNAS, 117 (50) 31987-31992
More information
General information about this species can be found in Wikipedia.
Statistics
Summary
Assembly | TDr96_F1_v2_PseudoChromosome, INSDC Assembly GCA_009730915.1, |
Database version | 113.3 |
Golden Path Length | 584,153,202 |
Genebuild by | IBRC |
Genebuild method | External annotation import |
Data source | Genetics and Breeding, Iwate Biotechnology Research Center |
Gene counts
Coding genes | 35,464 |
Gene transcripts | 66,500 |