Prunus_dulcis (ALMONDv2)

Prunus_dulcis Assembly and Gene Annotation

About Prunus dulcis

Almond (Prunus dulcis) is a rosaceous tree species cultivated for its seeds; it has a diploid (2n = 2x = 16) and compact genome (about 300 Mbp). The origin of the almond tree is not well established; its closest wild relatives live in central and western Asia, stretching from the Himalayas to the eastern Mediterranean Basin. The genus Prunus comprises a group of approximately 200 species, some of which, such as peach, apricot, cherry, plum and almond have high economic value. The high level of genomic resemblance and synteny among the species of this genus enables production of hybrids that are sometimes fertile.

Assembly

A total of 138.6 Gb of Illumina (>500x) and 10.2 Gb (37x) of Oxford Nanopore Technologies sequence of the highly heterozygous Prunus dulcis cv. Texas were produced. By analyzing k-mer frequency, the lower bound for genome size was estimated to be 238 Mb. The assembly was collapsed into a haploid representation and anchored it to eight pseudomolecules. The v.2.0 assembly (also known as pdulcis26) totals 227.6 Mb (91.5% of which is anchored to the eight pseudomolecules) and has a contig and scaffold N50s of 103.9 and 381.5 kb, respectively. The BUSCO completeness of the assembly is 96.4%.

Annotation

Gene annotation was performed by combining transcript alignments, protein alignments and ab initio gene predictions. A total of 27,969 protein-coding genes transcribing 34,039 transcripts (1.22 transcripts per gene) and encoding for 32,559 unique protein products were produced.

Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 429457 Low complexity (Dust) features, covering 17 Mb (7.3% of the genome); 74964 Repeats (ENA) features, covering 107 Mb (47.2% of the genome); 70178 RepeatMasker features (with the REdat library), covering 21 Mb (9.2% of the genome); 2198 RepeatMasker features (with the RepBase library), covering 0 Mb (0.1% of the genome); 147273 Tandem repeats (TRF) features, covering 13 Mb (5.6% of the genome).

Links

References

  1. Transposons played a major role in the diversification between the closely related almond and peach genomes: results from the almond genome sequence.
    Alioto T, Alexiou KG, Bardil A, Barteri F, Castanera R et al. 2019. Plant Journal.

Picture credit: Franz Eugen Köhler, Köhler's Medizinal-Pflanzen, Public domain

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyALMONDv2, INSDC Assembly GCA_902201215.1,
Database version100.1
Base Pairs227,498,357
Golden Path Length227,498,357
Genebuild byCNAG
Genebuild methodImport
Data sourceCNAG

Gene counts

Coding genes27,966
Non coding genes6,455
Small non coding genes6,455
Gene transcripts39,011

About this species