Malus domestica Golden (ASM211411v1)

Malus domestica Golden Assembly and Gene Annotation

The apple genome browser has been developed as part of the participation of Ensembl Plants in ELIXIR’s commissioned service “Apple as a Model for Genomic Information Exchange”.

About Malus domestica

Apple (Malus domestica) is usually a diploid species (2n = 2x = 34) that belongs to the Rosaceae family. It is one of the most famous fruits globally and occupies a central position in folklore, culture, and art. Apple varieties have retained high genetic and phenotypic diversity, evidenced by the high number of apple varieties cultivated today. Like most other fruit tree crops, apple is propagated by grafting onto rootstocks, which over time can allow the acquisition and propagation of epimutations, via variation in DNA methylation states that can influence various phenotypes, such as fruit color.

Assembly

This genome sequence corresponds to a homozygous, doubled haploid of 'Golden Delicious', cultivar GDDH13, also coded X9273. This is assembly version GDDH13 version 1.0. Three different technologies (paired-end, mate-pair and optical mapping) were combined for the assembly of DNA from leaves. In total, 120-fold coverage of Illumina paired-end reads (72 Gb), 80-fold coverage of Illumina Nextera mate-pair reads (58 Gb, insert sizes 2, 5 and 10 kb) and 35-fold coverage of PacBio sequencing data (24 Gb) were obtained. Paired-end reads were first assembled using SOAPdenovo, and the resulting contigs were combined with the PacBio reads using DBG2OLC. The mate-pair reads were used for scaffolding with BESST. Finally, a 600-fold coverage BioNano optical map was integrated to generate a consensus map that resulted in an assembly of 649.7 Mb. This consensus map was then used for the hybrid assembly with the corrected scaffolds, which, together with SNPs derived from a high-density genetic linkage map, allowed the construction of the 17 pseud-chromosomes. The estimated genome size of 651 Mb is very close to the 649.7-Mb size in the consensus map.

Annotation

A high-throughput RNA-seq analysis on poly(A)-enriched RNAs from nine libraries that originated from different genotypes and tissues was carried out. Reads were assembled, and the resulting contigs were mapped to the scaffolds and integrated in the EuGene combiner pipeline. In total, 42,140 protein-coding genes were identified. Evidence of transcription was found for 93% of the annotated genes, with 94,9% complete BUSCOs v2. Genes in unplaced contigs (Chr0) have not been imported.

Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 982,556 Low complexity (Dust) features, covering 109 Mb (15.5% of the genome); 341,667 RepeatMasker features (with the nrTEplants library), covering 326 Mb (46.4% of the genome); 218,553 RepeatMasker features (with the RepBase library), covering 87 Mb (12.4% of the genome); 421,645 Tandem repeats (TRF) features, covering 46 Mb (6.5% of the genome).

Variation

Over 10M SNP variants with MAF ≧ 0.05 were called on 70 apple varieties and lines. These SNPs do not include the markers of the Axiom Apple 480K genotyping array.

References

  1. Development and validation of the Axiom() Apple480K SNP genotyping array.
    Bianco L, Cestaro A, Linsmith G et al. 2016. Plant Journal. 86(1):62-74.
  2. High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development.
    N Daccord, JM Celton, G Linsmith et al . 2017. Nature Genetics. 49:10991106.

Picture credit: Assianir, license CC BY-SA 3.0

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyASM211411v1, INSDC Assembly GCA_002114115.1,
Database version111.1
Golden Path Length702,961,352
Genebuild byIRHS
Genebuild methodExternal annotation import
Data sourceIRHS

Gene counts

Coding genes40,624
Non coding genes1,872
Small non coding genes1,872
Gene transcripts42,496

Other

Short Variants10,627,081