Sesamum indicum Assembly and Gene Annotation
About Sesamum indicum
Sesame, Sesamum indicum, is considered the queen of oilseeds for its high oil content and quality, and is grown widely in tropical and subtropical areas as an important source of oil and protein. Compared to other edible oil crops such as soybean, rapeseed, peanut and olive, sesame has innately higher oil content and is thus an attractive potential model for studying lipid biosynthesis. Cultivated sesame is a self-pollinated diploid (2n=26), with an estimated genome size of 337-357Mbp, that belongs to the family Pedaliaceae and order Lamiales. Lamiales is one of the largest orders of flowering plants, with representatives found all over the world, including members such as olive, lavender and mint. This sequence is of inbreed genotype Zhongzhi No. 13.
A total of 54.5 Gb of high-quality data were obtained using the Illumina Hiseq2000 platform (153x). SOAPdenovo was used to assemble the genome, which resulted in a draft genome of 274 Mb with contig N50 of 52.2 kb and scaffold N50 of 2.1 Mb. Using a genetic map consisting of 406 markers, 150 large scaffolds (117 oriented) were anchored into 16 pseudomolecules, which harbor 85.3% of the genome assembly and 91.7% of the predicted genes.
The assembly covered 77.4-81.3% of the genome size. The integrity of gene space in the genome assembly was demonstrated by the successful mapping of 99.3% of 3,328 expressed sequence tags retrieved from GenBank, and 98.5% of 86,222 unigenes assembled de novo from previously reported RNA-Seq data. In addition, the large-scale assembly accuracy was assessed using five fosmid clones that were sequenced thoroughly using the Sanger sequencing technology, whereby 99.6% of the clone sequences, on average, were identical to the assembly.
There final assembly comprises 16 linkage groups for 13 chromosomes.
Over 25K protein-coding genes were predicted by ab initio and homology-based analyses, together with RNA-Seq reads-assisted annotation. Of those, 87.1% were supported by unigenes or protein similarity.
Repeats were annotated with Repeat Detector and the Ensembl Genomes repeat feature pipeline.There are: 564818 Low complexity (Dust) features, covering 26 Mb (9.5% of the genome); 102524 RepeatMasker features (with the nrTEplants library), covering 26 Mb (9.4% of the genome); 132402 Tandem repeats (TRF) features, covering 18 Mb (6.4% of the genome).
- Genome sequencing of the high oil crop sesame provides insight into oil biosynthesis.
Wang L, Yu S, Tong C, Zhao Y, Liu Y, Song C, Zhang Y, Zhang X, Wang Y, Hua W, Li D, Li D, Li F, Yu J, Xu C, Han X, Huang S, Tai S, Wang J, Xu X, Li Y, Liu S, Varshney RK, Wang J, Zhang X..
Picture credit: Köhler, F. E. (Franz Eugen)
General information about this species can be found in Wikipedia.
|Assembly||S_indicum_v1.0, INSDC Assembly GCA_000512975.1,|
|Golden Path Length||274,906,174|
|Genebuild method||External annotation import|
|Data source||Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences|