Prunus dulcis (ALMONDv2)

Prunus dulcis Assembly and Gene Annotation

About Prunus dulcis

Almond (Prunus dulcis) is a rosaceous tree species cultivated for its seeds; it has a diploid (2n = 2x = 16) and compact genome (about 300 Mbp). The origin of the almond tree is not well established; its closest wild relatives live in central and western Asia, stretching from the Himalayas to the eastern Mediterranean Basin. The genus Prunus comprises a group of approximately 200 species, some of which, such as peach, apricot, cherry, plum and almond have high economic value. The high level of genomic resemblance and synteny among the species of this genus enables production of hybrids that are sometimes fertile.


A total of 138.6 Gb of Illumina (>500x) and 10.2 Gb (37x) of Oxford Nanopore Technologies sequence of the highly heterozygous Prunus dulcis cv. Texas were produced. By analyzing k-mer frequency, the lower bound for genome size was estimated to be 238 Mb. The assembly was collapsed into a haploid representation and anchored it to eight pseudomolecules. The v.2.0 assembly (also known as pdulcis26) totals 227.6 Mb (91.5% of which is anchored to the eight pseudomolecules) and has a contig and scaffold N50s of 103.9 and 381.5 kb, respectively. The BUSCO completeness of the assembly is 96.4%.


Gene annotation was performed by combining transcript alignments, protein alignments and ab initio gene predictions. A total of 27,969 protein-coding genes transcribing 34,039 transcripts (1.22 transcripts per gene) and encoding for 32,559 unique protein products were produced.

Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 429,457 Low complexity (Dust) features, covering 17 Mb (7.3% of the genome); 159,501 RepeatMasker features (with the nrTEplants library), covering 90 Mb (39.4% of the genome); 70,178 RepeatMasker features (with the REdat library), covering 21 Mb (9.2% of the genome); 147,273 Tandem repeats (TRF) features, covering 13 Mb (5.6% of the genome).


  1. Transposons played a major role in the diversification between the closely related almond and peach genomes: results from the almond genome sequence.
    Alioto T, Alexiou KG, Bardil A, Barteri F, Castanera R et al. 2019. Plant Journal.

Picture credit: Franz Eugen Köhler, Köhler's Medizinal-Pflanzen, Public domain

More information

General information about this species can be found in Wikipedia.



AssemblyALMONDv2, INSDC Assembly GCA_902201215.1,
Database version111.1
Golden Path Length227,498,357
Genebuild byCNAG
Genebuild methodImport
Data sourceCNAG

Gene counts

Coding genes27,966
Non coding genes6,455
Small non coding genes6,455
Gene transcripts39,011