Malus domestica Golden Assembly and Gene Annotation
About Malus domestica
Apple (Malus domestica) is usually a diploid species (2n = 2x = 34) that belongs to the Rosaceae family. It is one of the most famous fruits globally and occupies a central position in folklore, culture, and art. Apple varieties have retained high genetic and phenotypic diversity, evidenced by the high number of apple varieties cultivated today. Like most other fruit tree crops, apple is propagated by grafting onto rootstocks, which over time can allow the acquisition and propagation of epimutations, via variation in DNA methylation states that can influence various phenotypes, such as fruit color.
This genome sequence corresponds to a homozygous, doubled haploid of 'Golden Delicious', cultivar GDDH13, also coded X9273. This is assembly version GDDH13 version 1.0. Three different technologies (paired-end, mate-pair and optical mapping) were combined for the assembly of DNA from leaves. In total, 120-fold coverage of Illumina paired-end reads (72 Gb), 80-fold coverage of Illumina Nextera mate-pair reads (58 Gb, insert sizes 2, 5 and 10 kb) and 35-fold coverage of PacBio sequencing data (24 Gb) were obtained. Paired-end reads were first assembled using SOAPdenovo, and the resulting contigs were combined with the PacBio reads using DBG2OLC. The mate-pair reads were used for scaffolding with BESST. Finally, a 600-fold coverage BioNano optical map was integrated to generate a consensus map that resulted in an assembly of 649.7 Mb. This consensus map was then used for the hybrid assembly with the corrected scaffolds, which, together with SNPs derived from a high-density genetic linkage map, allowed the construction of the 17 pseud-chromosomes. The estimated genome size of 651 Mb is very close to the 649.7-Mb size in the consensus map.
A high-throughput RNA-seq analysis on poly(A)-enriched RNAs from nine libraries that originated from different genotypes and tissues was carried out. Reads were assembled, and the resulting contigs were mapped to the scaffolds and integrated in the EuGene combiner pipeline. In total, 42,140 protein-coding genes were identified. Evidence of transcription was found for 93% of the annotated genes, with 94,9% complete BUSCOs v2. Genes in unplaced contigs (Chr0) have not been imported.
Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 982556 Low complexity (Dust) features, covering 109 Mb (15.5% of the genome); 218553 RepeatMasker features (with the REdat library), covering 87 Mb (12.4% of the genome); 458059 RepeatMasker features (with the RepBase library), covering 305 Mb (43.4% of the genome); 421645 Tandem repeats (TRF) features, covering 46 Mb (6.5% of the genome).
Over 10M SNP variants with MAF ≧ 0.05 were called on 70 apple varieties and lines. These SNPs do not include the markers of the Axiom Apple 480K genotyping array.
- Development and validation of the Axiom() Apple480K SNP genotyping
Bianco L, Cestaro A, Linsmith G et al. 2016. Plant Journal. 86(1):62-74.
- High-quality de novo assembly of the apple genome and methylome
dynamics of early fruit
N Daccord, JM Celton, G Linsmith et al . 2017. Nature Genetics. 49:10991106.
Picture credit: Assianir, license CC BY-SA 3.0
General information about this species can be found in Wikipedia.
|Assembly||ASM211411v1, INSDC Assembly GCA_002114115.1,|
|Golden Path Length||702,961,352|
|Genebuild method||External annotation import|
|Non coding genes||1,872|
|Small non coding genes||1,872|