Daucus carota Assembly and Gene Annotation
About Daucus carota subsp. sativus
Carrot is a globally important root crop whose production has quadrupled between 1976 and 2013, outpacing the overall rate of increase in vegetable production and world population growth through development of high-value products for fresh consumption, juices, and natural pigments and cultivars adapted to warmer production regions.
Assembly
An orange, doubled-haploid, Nantes-type carrot (DH1) was used for genome sequencing by the United States Department of Agriculture. BAC end sequences were used and a newly developed linkage map with 2,075 markers to correct 135 scaffolds with one or more chimeric regions.
The resulting v2.0 assembly spans 421.5 Mb and contains 4,907 scaffolds (N50 of 12.7 Mb), accounting for approximately 90% of the estimated genome size. The scaftig N50 of 31.2 kb is similar to those of other high-quality genome assemblies such as potato. About 86% (362 Mb) of the assembled genome is included in only 60 super-scaffolds anchored to the nine pseudomolecules. The longest superscaffold spans 30.2 Mb, 85% of chromosome 4.
Annotation
In the gene annotation carreid out by the United States Department of Agriculture, 32,113 genes were predicted, of which 79% had substantial homology with known genes. The majority (98.7%) of gene predictions had supporting cDNA and/or EST evidence, demonstrating the high accuracy of gene prediction. Relative to five other closely related genomes, carrot was enriched for genes involved in a wide range of molecular functions. 564 tRNAs, 31 rRNA fragments, 532 small nuclear RNA (snRNA) genes, and 248 microRNAs (miRNAs) were also identified, distributed among 46 families.
Repeated sequences were called with the Repeat Detector, which is part of the Ensembl Genomes repeat feature pipelines. Repeats length: 144037532 - Repeats content: 34.2%
References
- A high-quality carrot genome assembly provides new insights into
carotenoid accumulation and asterid genome
evolution.
Massimo Iorizzo, Shelby Ellison, Douglas Senalik, Peng Zeng, Pimchanok Satapoomin, Jiaying Huang, Megan Bowman, Marina Iovene, Walter Sanseverino, Pablo Cavagnaro et al. 2016. Nature Genetics. 48:657666.
More information
General information about this species can be found in Wikipedia.
Statistics
Summary
Assembly | ASM162521v1, INSDC Assembly GCA_001625215.1, May 2016 |
Database version | 113.1 |
Golden Path Length | 421,502,825 |
Genebuild by | USDA |
Genebuild method | Import |
Data source | United States Department of Agriculture |
Gene counts
Coding genes | 32,109 |
Non coding genes | 2,154 |
Small non coding genes | 2,089 |
Long non coding genes | 65 |
Gene transcripts | 34,263 |