Corchorus capsularis Assembly and Gene Annotation

About Corcorus capsularis

Corcorus capsularis (white jute) is one of the most important sources of natural fibre, along with Corcorus olitorius, covering ∼80% of global bast fibre production. The genome size is approximately 400 Mbp (n = 7). The assembly presented here is at the scaffold level, comprising 6,125 scaffolds with a scaffold N50 of 4.1 Mbp.

Assembly

Whole-genome shotgun sequencing was performed using Roche/454 GS FLX and was assembled using CABOG (version 7). A total of 13.69 Gbp of sequence data was generated (~30x), consisting of 7.87 Gbp of shotgun sequences (~20x), 2.04 Gbp of 3-kbp paired-end sequences (14x physical coverage), 2.26 Gbp of 8-kbp paired-end sequences (15x physical coverage), and 1.51 Gbp of 20-kbp paired-end sequences (11x physical coverage). The resulting assembly was 338 Mbp in 6,125 scaffolds with a scaffold N50 of 4.1 Mbp. Eighty per cent of the assembly was covered with 231 scaffolds (minimum length 120 kbp) and is estimated to cover about 82% of the genome [1].

More than 97% of the isotigs generated from transcriptome sequencing of seedlings aligned to the genome indicating comprehensive coverage of the gene-rich regions. In addition, more than 97% of the conserved core eukaryotic genes [2] were present in the genome.

Annotation

A total of 30,096 protein-coding genes were predicted by a combination of de novo, homology and transcriptome-based approaches [1].

References

  1. Comparative genomics of two jute species and insight into fibre biogenesis.
    Islam MS, Saito JA, Emdad EM, Ahmed B, Islam MM, Halim A, Hossen QM, Hossain MZ, Ahmed R, Hossain MS et al. 2017. Nature plants. 3:16223.
  2. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes.
    Parra G, Bradnam K, Korf I. 2007. Bioinformatics. 23:1061.

Picture credit: By Luigi Chiesa (Scan by Luigi Chiesa) [GFDL (http://www.gnu.org/copyleft/fdl.html), CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0/) or CC BY-SA 2.0 it (http://creativecommons.org/licenses/by-sa/2.0/it/deed.en)], via Wikimedia Commons

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyCCACVL1_1.0, INSDC Assembly GCA_001974805.1, Feb 2017
Database version90.1
Base Pairs317,178,409
Golden Path Length317,178,409
Genebuild byEnsemblPlants
Genebuild methodGenerated from ENA annotation
Data sourceEuropean Nucleotide Archive

Gene counts

Coding genes29,356
Non coding genes2,185
Small non coding genes1,987
Long non coding genes198
Pseudogenes8
Gene transcripts31,549

About this species