Pistacia vera (PisVer_v2)

Pistacia vera Assembly and Gene Annotation

The Pistachio Genome Project is a collaboration among Shahid Bahonar University of Kerman, Pistachio Research Center at the Horticultural Sciences Research Institute (AREEO, Rafsanjan, Iran) and the Chinese Academy of Sciences with funding from the Animal Branch of the Germplasm Bank of Wild Species, Chinese Academy of Sciences (the Large Research Infrastructure Funding) and the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB13020600), and contributions in effort from pistachio breeders in Iran.

About Pistacia vera

Pistachio (Pistacia vera, 2n = 30) is one of the most important commercial nut crops worldwide that originated from Central Asia and the Middle East. Pistachio tree is a deciduous, long-living and desert plant which is able to tolerate high levels of salinity and drought stress. Pistachio is a member of the Anacardiaceae family that was domesticated about 8000 years ago.


An individual of cultivar Batoury was chosen for genome sequencing and assembly. The genome was sequenced with the Illumina Hiseq 2500 platform from multiple paired-end libraries, including two small-insert libraries (270 bp and 500 bp) and six long-insert mate-pair libraries (3 kb, 4 kb, 8 kb, 10 kb, 15 kb, and 17 kb), achieving 270.47X coverage. A draft genome of 569.12 Mb was assembled, with contig and scaffold N50 sizes of 20.69 kb and 768.39 kb, respectively. To improve the continuity, a total of 4,038,150 filtered long reads were generated, with average lengths of 14,568 bp from 59 Gb sequencing data by Pacbio Sequel System. Finally, a draft genome of 671 Mb was assembled, with contig and scaffold N50 sizes of 75.7 kb and 949.2 kb, respectively provided a total of 373.84X coverage. The completeness of the genome assembly was confirmed by CEGMA and BUSCO software.


Protein-coding genes were predicted using de novo and protein homology-based approaches. Genscan v1.0, Augustus v2.5.5, GlimmerHMM v3.0.1, GeneID v1.3, and SNAP were performed for de novo gene prediction, while homologous peptides from the A. thaliana (TAIR 10), Oryza sativa (Nipponbare, IRGSP-1.0), Theobroma cacao (Phytozome v12.1), and C. sinensis (Phytozome v12.1) genomes were aligned to our assembly to identify the homologous genes with GeMoMa v1.4.2. RNA-Seq reads were assembled using Trinity, and the resulting unigenes were aligned to the repeat-masked assemblies using BLAT, and subsequently, the gene structures of BLAT alignment results were modeled using PASA. Then, protein-coding regions were identified with TransDecoder v3.0.1 and GeneMarkS-T, respectively. Consensus gene models were generated by integrating the de novo predictions and protein alignments using EVidenceModeler.

Repeated sequences were called with the Repeat Detector, which is part of the Ensembl Genomes repeat feature pipelines. Repeats length: 332023120 - Repeats content: 49.5%


  1. Whole genomes and transcriptomes reveal adaptation and domestication of pistachio.
    Zeng L, Tu XL, Dai H, Han FM, Lu BS, Wang MS, Nanaei HA, Tajabadipour A, Mansouri M, Li XL et al. 2019. Genome Biology. 20:79.

Picture credit: Professor Ali Esmailizadeh, Shahid Bahonar University of Kerman

More information

General information about this species can be found in Wikipedia.



AssemblyPisVer_v2, INSDC Assembly GCA_008641045.1,
Database version109.1
Golden Path Length671,152,441
Genebuild byEVM
Genebuild methodExternal annotation import
Data sourceChinese Academy of Sciences

Gene counts

Coding genes31,784
Gene transcripts31,784