Cannabis sativa female Assembly and Gene Annotation
About Cannabis sativa
Cannabis sativa (hemp) has been cultivated for millennia with distinct cultivars providing either fiber, oil and confectionary seed or tetrahydrocannabinol. It is diploid (2n=20) and its native range is SE European Russia to NW China and Pakistan. This is a dioecious species with sexual dimorphism occurring in a late stage of plant development. Sex is determined by heteromorphic chromosomes: male is the heterogametic sex (XY) and female is the homogametic one (XX). Sex is considered an important trait for hemp genetic improvement.
Assembly
A female of cultivar CBDRx was sequenced with ultra-long Nanopore reads (34x) and its chromosomes resolved using markers from a ultra-high-density genetic map of 96 recombinant individuals resulting from a cross between near-isogenic lines Skunk#1 and Carmen. The genome was assembled using a correction-less pipeline that consisted of an overlap (minimap2), layout (miniasm2) consensus (racon), followed by a polishing step (pilon) using a 64x Illumina 2x100 bp paired end reads. The resulting assembly was 746 Mbp in 1,986 contigs with an N50 length of 742 kb and the longest contig 4.5 Mbp.
Annotation
Full-length cDNAs, Stringtie assembly of 142 RNAseq libraries, and Trinity transcripts were assembled into gene models with PASA. Genes were also predicted ab initio using Augustus. Non redundant RefSeq Viridiplantae proteins were clustered at 90% identity with CD-HIT and representative sequences aligned to the reference. Pairwise hits were locally realigned with AAT and Exonerate protein2genome and PASA updated.
Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 1,532,461 Low complexity (Dust) features, covering 87 Mb (9.9% of the genome); 175,269 RepeatMasker features (with the REdat library), covering 97 Mb (11.1% of the genome); 9,445 RepeatMasker features (with the RepBase library), covering 1 Mb (0.2% of the genome); 461,225 Tandem repeats (TRF) features, covering 33 Mb (3.8% of the genome); Repeat Detector repeats length 439Mb (50.2% of the genome).
References
- PPR/PPR60520
- AGR/IND43733837
Image credit: Thayne Tuason CC BY-SA 4.0
Links
More information
General information about this species can be found in Wikipedia.
Statistics
Summary
Assembly | cs10, INSDC Assembly GCA_900626175.1, |
Database version | 113.1 |
Golden Path Length | 875,732,045 |
Genebuild by | Cannabis Genome |
Genebuild method | External annotation import |
Data source | HARVARD OEB |
Gene counts
Coding genes | 27,249 |
Gene transcripts | 36,254 |