Prunus persica Assembly and Gene Annotation
About Prunus persica
Prunus persica (peach) is an economically important deciduous tree in the Rosaceae family that produces 20 million tons of fruit per year. The rosaceae family contains herbs, shrubs and trees with a wide variety of fruit types and habits and includes several species grown for their fruits (peaches, apples and strawberries), lumber (black cherry) and ornamental value (roses).
Peach was first domesticated and cultivated in North-West China and has a compact diploid genome (265 Mb, 2n =16).
Assembly
JGI performed the initial assembly using Sanger sequence reads representing 8.5-fold coverage of a double haploid genotype of cv. Lovell using Arachne. The resulting contigs and scaffolds were filtered to give 234 scaffolds covering 224.6 Mb of the peach genome (Peach v1.0) with scaffold and contig N50/L50 values of 4 Mb/26.8 Mb and 294 kb/214.2 kb, respectively with good QC statistics.
Five DNA libraries were end-sequenced, giving a total of 8.47-fold sequence coverage: 536,032 reads from the 2.8 kb sized library, 606,680 reads from the 4.4 kb sized library, 2,106,103 reads from the 7.8 kb sized library, 419,424 reads from the 35.3 kb fosmid library, and 61,440 reads from the 69.5 kb BAC library.
Annotation
A total of 27,852 protein-coding genes and 28,689 protein-coding transcripts were predicted by JGI.
Predictions began with PASA transcript assemblies based on ESTs from peach and related species. Transcript assemblies and a collection of plant peptide sequences were blasted against the assembly and gene models were predicted using by homology-based predictors FGENESH+ and GenomeScan. Predicted gene models were improved and refined by PASA.
Repeated sequences were called with the Repeat Detector, which is part of the Ensembl Genomes repeat feature pipelines. Repeats length: 74986019 - Repeats content: 32.9%
Sequence alignment
Approximately 80,000 EST sequences have been aligned to the genome with STAR [View data]
References
- The high-quality draft genome of peach (Prunus persica) identifies
unique patterns of genetic diversity, domestication and genome
evolution.
Verde I, Abbott AG, Scalabrin S, Jung S, Shu S, Marroni F, Zhebentyayeva T, Dettori MT, Grimwood J, Cattonaro F et al. 2013. Nat. Genet.. 45:487-494.
Picture credit: Image created by skyseeker and released under a Creative Commons Attribution License.
Links
- Prunus persica ESTs at ENA
More information
General information about this species can be found in Wikipedia.
Statistics
Summary
Assembly | Prunus_persica_NCBIv2, INSDC Assembly GCA_000346465.2, Feb 2017 |
Database version | 113.2 |
Golden Path Length | 227,411,381 |
Genebuild by | JGI |
Genebuild method | Import |
Data source | Joint Genome Institute |
Gene counts
Coding genes | 26,873 |
Non coding genes | 976 |
Small non coding genes | 968 |
Long non coding genes | 8 |
Gene transcripts | 48,065 |