Sesamum indicum Assembly and Gene Annotation

About Sesamum indicum

Sesame, Sesamum indicum, is considered the queen of oilseeds for its high oil content and quality, and is grown widely in tropical and subtropical areas as an important source of oil and protein. Compared to other edible oil crops such as soybean, rapeseed, peanut and olive, sesame has innately higher oil content and is thus an attractive potential model for studying lipid biosynthesis. Cultivated sesame is a self-pollinated diploid (2n=26), with an estimated genome size of 337-357Mbp, that belongs to the family Pedaliaceae and order Lamiales. Lamiales is one of the largest orders of flowering plants, with representatives found all over the world, including members such as olive, lavender and mint. This sequence is of inbreed genotype Zhongzhi No. 13.

Assembly

A total of 54.5 Gb of high-quality data were obtained using the Illumina Hiseq2000 platform (153x). SOAPdenovo was used to assemble the genome, which resulted in a draft genome of 274 Mb with contig N50 of 52.2 kb and scaffold N50 of 2.1 Mb. Using a genetic map consisting of 406 markers, 150 large scaffolds (117 oriented) were anchored into 16 pseudomolecules, which harbor 85.3% of the genome assembly and 91.7% of the predicted genes.

The assembly covered 77.4-81.3% of the genome size. The integrity of gene space in the genome assembly was demonstrated by the successful mapping of 99.3% of 3,328 expressed sequence tags retrieved from GenBank, and 98.5% of 86,222 unigenes assembled de novo from previously reported RNA-Seq data. In addition, the large-scale assembly accuracy was assessed using five fosmid clones that were sequenced thoroughly using the Sanger sequencing technology, whereby 99.6% of the clone sequences, on average, were identical to the assembly.

There final assembly comprises 16 linkage groups for 13 chromosomes.

Annotation

Over 25K protein-coding genes were predicted by ab initio and homology-based analyses, together with RNA-Seq reads-assisted annotation. Of those, 87.1% were supported by unigenes or protein similarity.

Repeats were annotated with Repeat Detector and the Ensembl Genomes repeat feature pipeline.There are: 564818 Low complexity (Dust) features, covering 26 Mb (9.5% of the genome); 102524 RepeatMasker features (with the nrTEplants library), covering 26 Mb (9.4% of the genome); 132402 Tandem repeats (TRF) features, covering 18 Mb (6.4% of the genome).

References

Genome sequencing of the high oil crop sesame provides insight into oil biosynthesis.
Wang L, Yu S, Tong C, Zhao Y, Liu Y, Song C, Zhang Y, Zhang X, Wang Y, Hua W, Li D, Li D, Li F, Yu J, Xu C, Han X, Huang S, Tai S, Wang J, Xu X, Li Y, Liu S, Varshney RK, Wang J, Zhang X..

Picture credit: Köhler, F. E. (Franz Eugen)

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

Assembly	S_indicum_v1.0, INSDC Assembly GCA_000512975.1,
Database version	114.1
Golden Path Length	274,906,174
Genebuild by	SinBase
Genebuild method	External annotation import
Data source	Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences

Gene counts

Gene/transcipt that contains an open reading frame (ORF).Coding genes	25,173
A transcript is the operational unit of a gene. In a genomic context, transcripts consist of one or more exons, with adjoining exons being separated by introns. The exons/introns are transcribed and then the introns spliced out. Transcripts may or may not encode a proteinGene transcripts	25,173

Sesamum indicum Assembly and Gene Annotation

About Sesamum indicum

Assembly

Annotation

References

Links

More information

Statistics

Summary

Gene counts

About Us

Get help

Our sister sites

Follow us

Favourite species

All species

Sesamum indicum Assembly and Gene Annotation

About Sesamum indicum

Assembly

Annotation

References

Links

More information

Statistics

Summary

Gene counts

About Us

Get help

Our sister sites

Follow us