Corchorus capsularis Assembly and Gene Annotation
About Corcorus capsularis
Corcorus capsularis (white jute) is one of the most important sources of natural fibre used for items like rugs, bags, string and baskets, along with Corcorus olitorius, covering ∼80% of global bast fibre production. The genome size is approximately 400 Mb (n = 7).
Assembly
Whole-genome shotgun sequencing was performed using Roche/454 GS FLX and was assembled using CABOG (version 7) by the Bangladesh Jute Research Institute. A total of 13.69 Gb of sequence data was generated (~30x), consisting of 7.87 Gb of shotgun sequences (~20x), 2.04 Gb of 3-kb paired-end sequences (14x physical coverage), 2.26 Gb of 8-kb paired-end sequences (15x physical coverage), and 1.51 Gb of 20-kb paired-end sequences (11x physical coverage). The resulting assembly was 338 Mb in 6,125 scaffolds with a scaffold N50 of 4.1 Mb. 80% of the assembly was covered with 231 scaffolds (minimum length 120 kb).
Annotation
A total of 30,096 protein-coding genes were predicted by a combination of de novo, homology and transcriptome-based approaches by the Bangladesh Jute Research Institute.
More than 97% of the isotigs generated from transcriptome sequencing of seedlings aligned to the genome indicating comprehensive coverage of the gene-rich regions. In addition, more than 97% of the conserved core eukaryotic genes were present in the genome.
Repeated sequences were called with the Repeat Detector, which is part of the Ensembl Genomes repeat feature pipelines. Repeats length: 124357288 - Repeats content: 39.2%
References
- Comparative genomics of two jute species and insight into fibre
biogenesis.
Islam MS, Saito JA, Emdad EM, Ahmed B, Islam MM, Halim A, Hossen QM, Hossain MZ, Ahmed R, Hossain MS et al. 2017. Nature plants. 3:16223. - CEGMA: a pipeline to accurately annotate core genes in eukaryotic
genomes.
Parra G, Bradnam K, Korf I. 2007. Bioinformatics. 23:1061.
Picture credit: By Luigi Chiesa (Scan by Luigi Chiesa) [GFDL (http://www.gnu.org/copyleft/fdl.html), CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0/) or CC BY-SA 2.0 it (http://creativecommons.org/licenses/by-sa/2.0/it/deed.en)], via Wikimedia Commons
More information
General information about this species can be found in Wikipedia.
Statistics
Summary
Assembly | CCACVL1_1.0, INSDC Assembly GCA_001974805.1, Jan 2017 |
Database version | 113.1 |
Golden Path Length | 317,178,409 |
Genebuild by | BJRI |
Genebuild method | Import |
Data source | Bangladesh Jute Research Institute |
Gene counts
Coding genes | 29,356 |
Non coding genes | 2,014 |
Small non coding genes | 1,816 |
Long non coding genes | 198 |
Pseudogenes | 8 |
Gene transcripts | 31,378 |