Cynara cardunculus (CcrdV1)

Cynara cardunculus Assembly and Gene Annotation

About Cynara cardunculus

Globe artichoke (Cynara cardunculus var. scolymus) is an out-crossing, perennial crop species native to the Mediterranean basin and grown worldwide. It belongs to the Compositae (a.k.a. Asteraceae), one of the most successful Angiosperm families. Cynara cardunculus is a diploid species (2n=2x=34) with a medium-sized genome estimated by flow cytometry to be 1.07 Gb.

Assembly

Clone 2C, obtained by three cycles of selfing (S3) and with a residual heterozygosity of approximately 10%, was used to prepare one paired-end (PE) and two mate-pair (MP) libraries consistent with requirements of ALLPATHS-LG. The 100 + 100 bp PE reads had an average insert size of 170 bp and produced 67 Gb of raw data (equivalent to an estimated genome coverage of 62x). The sequencing of the two MP libraries, with average effective insert sizes of 2.5 Kb and 5.5 Kb, produced 56 Gb and 7 Gb of raw data, respectively (corresponding to 52x and 6.5x genome coverages). The resulting assembly comprised 13,588 scaffolds covering 725 of the 1,084 Mb genome (N50=1,408 and L50=125.9 Kb). Re-sequencing (30x) of globe artichoke and cultivated cardoon (C. cardunculus var. altilis) parental genotypes and low-coverage (0.5 to 1x) genotyping-by-sequencing of 163 F1 individuals resulted in 73% of the assembled genome being anchored in 2,178 genetic bins ordered along 17 chromosomal pseudomolecules. This was achieved using pipeline SOILoCo (Scaffold Ordering by Imputation with Low Coverage).

Annotation

Gene prediction utilized reiterative runs of the MAKER suite. Both EST sequences and RNAseq data were used to guide gene annotation. The annotation pipeline yielded 26,889 gene models (and a total of 27,121 predicted transcripts), of which 23,895 were located in the chromosomal pseudomolecules.

Repeated sequences were called with the Repeat Detector, which is part of the Ensembl Genomes repeat feature pipelines. Repeats length: 354774964 - Repeats content: 48.9%

References

  1. The genome sequence of the outbreeding globe artichoke constructed de novo incorporating a phase-aware low-pass sequencing strategy of F1 progeny.
    Scaglione D, Reyes-Chin-Wo S, Acquadro A et al. 2016. Scientific Reports. 6:19427.

Picture credit: Lusitana (License: CC BY 2.5)

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyCcrdV1, INSDC Assembly GCA_001531365.1,
Database version111.1
Golden Path Length724,667,265
Genebuild byCGP
Genebuild methodImport
Data sourceCompositae Genome Project

Gene counts

Coding genes26,505
Gene transcripts26,505