Corylus avellana Assembly and Gene Annotation
About Corylus avellana
The genus Corylus describes hazels, deciduous trees and large shrubs that are widespread throughout the northern hemisphere and are grown for their edible nuts, wood and ornamental purposes. The most economically significant species is the diploid (2n=22) European hazel (Corylus avellana L.), the nuts of which are known as hazelnuts, filberts or cobnuts, and are consumed worldwide both directly and as an ingredient in many food and confectionary products. Hazelnuts prefer a mild, damp climate; historically, production is concentrated in Turkey, Italy, Azerbaijan and the USA. In recent years several other countries have begun actively developing their hazelnut industry, such as China, Georgia, Iran and Chile.
As with many tree species, C. avellana has a long generation time (up to 8 years to reach full productivity) and also displays sporophytic self-incompatibility, with genetically similar individuals unable to pollinate each other. These factors make selecting for many important traits by classical breeding approaches extremely difficult. Therefore, genomic data, which allows the identification and selection of many genetic loci simultaneously, has huge potential to accelerate the research and breeding of hazel.
The genome sequence assembly of the commercially important Turkish cultivar Tombul was generated using a hybrid approach. Illumina whole genome shotgun sequences & Oxford Nanopore Technologies MinION long reads were assembled de novo using MaSurCA. Contigs from this assembly were ordered into chromosome-sized scaffolds through chromosome conformational capture sequencing (Dovetail Genomics Chicago & HiC). A high-quality 370-Mb assembly covering 97.8% of the estimated genome size was obtained, in close agreement with cytogenetic studies and genetic maps.
Augustus was used to generate ab initio gene predictions from the masked Tombul genome based on hidden Markov models optimized for Arabidopsis thaliana. These predictions were tested by aligning the gene models with the Tombul transcriptome assembly to provide evidence for gene expression. Gene models were also assigned to the Viridiplantae OGs conserved and recorded in OrthoMCL-DB. Low-quality matches were eliminated from the annotation and gene models in some OGs. Where the intron–exon structure of C. avellana gene models was inconsistent with orthologues from other plants, fresh predictions were generated using fgenesh, with training parameters from Betula nana and the most consistent gene model retained.
Repeated sequences called with the Repeat Detector, which is part of the [Ensembl Genomes repeat feature pipelines](http://plants.ensembl.or g/info/genome/annotation/repeat_features.html), cover 151 Mb (40.8% of the genome). Low complexity (Dust) features cover 31.7 Mb, RepeatMasker features (with the nrTEplants library) cover 40 Mb, Tandem repeats (TRF) features cover 15.5 Mb.
- A chromosome-scale genome assembly of European hazel (Corylus avellana L.) reveals targets for crop improvement.
Lucas SJ, Kahraman K, Avşar B, Buggs RJA, Bilge I..
- High density SNP mapping and QTL analysis for time of leaf budburst in Corylus avellana L.
Torello Marinoni D, Valentini N, Portis E, Acquadro A, Beltramo C, Mehlenbacher SA, Mockler TC, Rowley ER, Botta R..
Picture credit: https://en.wikipedia.org/wiki/Corylus_avellana