Prunus dulcis Assembly and Gene Annotation
About Prunus dulcis
Almond (Prunus dulcis) is a rosaceous tree species cultivated for its seeds; it has a diploid (2n = 2x = 16) and compact genome (about 300 Mbp). The origin of the almond tree is not well established; its closest wild relatives live in central and western Asia, stretching from the Himalayas to the eastern Mediterranean Basin. The genus Prunus comprises a group of approximately 200 species, some of which, such as peach, apricot, cherry, plum and almond have high economic value. The high level of genomic resemblance and synteny among the species of this genus enables production of hybrids that are sometimes fertile.
Assembly
A total of 138.6 Gb of Illumina (>500x) and 10.2 Gb (37x) of Oxford Nanopore Technologies sequence of the highly heterozygous Prunus dulcis cv. Texas were produced. By analyzing k-mer frequency, the lower bound for genome size was estimated to be 238 Mb. The assembly was collapsed into a haploid representation and anchored it to eight pseudomolecules. The v.2.0 assembly (also known as pdulcis26) totals 227.6 Mb (91.5% of which is anchored to the eight pseudomolecules) and has a contig and scaffold N50s of 103.9 and 381.5 kb, respectively. The BUSCO completeness of the assembly is 96.4%.
Annotation
Gene annotation was performed by combining transcript alignments, protein alignments and ab initio gene predictions. A total of 27,969 protein-coding genes transcribing 34,039 transcripts (1.22 transcripts per gene) and encoding for 32,559 unique protein products were produced.
Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 429,457 Low complexity (Dust) features, covering 17 Mb (7.3% of the genome); 159,501 RepeatMasker features (with the nrTEplants library), covering 90 Mb (39.4% of the genome); 70,178 RepeatMasker features (with the REdat library), covering 21 Mb (9.2% of the genome); 147,273 Tandem repeats (TRF) features, covering 13 Mb (5.6% of the genome).
References
- Transposons played a major role in the diversification between the
closely related almond and peach genomes: results from the almond
genome sequence.
Alioto T, Alexiou KG, Bardil A, Barteri F, Castanera R et al. 2019. Plant Journal.
Picture credit: Franz Eugen Köhler, Köhler's Medizinal-Pflanzen, Public domain
Links
More information
General information about this species can be found in Wikipedia.
Statistics
Summary
Assembly | ALMONDv2, INSDC Assembly GCA_902201215.1, |
Database version | 113.1 |
Golden Path Length | 227,498,357 |
Genebuild by | CNAG |
Genebuild method | Import |
Data source | CNAG |
Gene counts
Coding genes | 27,966 |
Non coding genes | 6,455 |
Small non coding genes | 6,455 |
Gene transcripts | 39,011 |