Prunus persica Assembly and Gene Annotation

About Prunus persica

Prunus persica (peach) is an economically important deciduous tree in the Rosaceae family that produces 20 million tons of fruit per year. The rosaceae family contains herbs, shrubs and trees with a wide variety of fruit types and habits and includes several species grown for their fruits (peaches, apples and strawberries), lumber (black cherry) and ornamental value (roses).

Peach was first domesticated and cultivated in North-West China and has a compact diploid genome (265 Mb, 2n =16).

Assembly

The initial assembly was performed using Sanger sequence reads representing 8.5-fold coverage of a double haploid genotype of cv. Lovell using Arachne. The resulting contigs and scaffolds were filtered to give 234 scaffolds covering 224.6 Mb of the peach genome (Peach v1.0) with scaffold and contig N50/L50 values of 4 Mb/26.8 Mb and 294 kb/214.2 kb, respectively with good QC statistics [1].

Five DNA libraries were end-sequenced, giving a total of 8.47-fold sequence coverage: 536,032 reads from the 2.8 kb sized library, 606,680 reads from the 4.4 kb sized library, 2,106,103 reads from the 7.8 kb sized library, 419,424 reads from the 35.3 kb fosmid library, and 61,440 reads from the 69.5 kb BAC library.

Annotation

A total of 27,852 protein-coding genes and 28,689 protein-coding transcripts were predicted [1].

Predictions began with PASA transcript assemblies based on ESTs from peach and related species. Transcript assemblies and a collection of plant peptide sequences were blasted against the assembly and gene models were predicted using by homology-based predictors FGENESH+ and GenomeScan. Predicted gene models were improved and refined by PASA.

Sequence alignment

Approximately 80,000 EST sequences have been aligned to the genome with STAR [View data]

Links

  • Prunus persica ESTs at ENA

References

  1. The high-quality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution.
    Verde I, Abbott AG, Scalabrin S, Jung S, Shu S, Marroni F, Zhebentyayeva T, Dettori MT, Grimwood J, Cattonaro F et al. 2013. Nat. Genet.. 45:487-494.

Picture credit: Image created by skyseeker and released under a Creative Commons Attribution License.

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyPrunus_persica_NCBIv2, INSDC Assembly GCA_000346465.2, Sep 2017
Database version93.2
Base Pairs224,638,928
Golden Path Length227,411,381
Genebuild byEnsemblPlants
Genebuild methodGenerated from ENA annotation
Data sourceEuropean Nucleotide Archive

Gene counts

Coding genes26,873
Non coding genes234
Small non coding genes234
Gene transcripts47,323

About this species