Juglans regia Assembly and Gene Annotation
About Juglans regia
The Persian walnut (Juglans regia L.), a diploid species (2n=32) native to the mountainous regions of Central Asia, is the major walnut species cultivated for nut production and is one of the most widespread tree nut species in the world. It belongs to the Juglandaceae family and has a genome size of 620-667 Mbp. The high nutritional value of J. regia nuts is associated with a rich array of polyphenolic compounds. This genome sequence was obtained from the cultivar Chandler.
Assembly
A total of 3.7M Illumina super reads were produced with the MaSuRCA assembler, and then combined with 7M (35x) Oxford Nanopore Technology long reads. Finally, the resulting mega-reads were combined to obtain a hybrid assembly, which comprised 1,498 scaffolds, 258 contigs, and 25,007 old scaffolds from Chandler v1.0 [2]. To improve the assembly further and build chromosome-scale scaffolds, Hi-C sequencing was applied. The HiRise scaffolding pipeline processed 356M paired-end Illumina reads to generate the final assembly.
To assess the quality of the HiRise assembly, two genetic maps were used [3,4]. Almost perfect collinearity was observed between the HiRise assembly and both Chandler maps. Therefore, the HiRise scaffolds were oriented, ordered, and named accordingly, generating the final 16 chromosomal pseudomolecules.
Annotation
Full-length transcripts from single-molecule real-time sequencing were used to predict 37,554 gene models, with a mean gene length higher than the previous v1.0 gene annotations. Most of the new protein-coding genes (90%) presents both start and stop codons, which represents a significant improvement compared to Chandler v1.0. Of the 40,884 transcripts identified, 84% were multi-exonic, with 5.9 exons each, on average. Also, 2,801 gene models had from 2 to 4 transcript isoforms each, with a mean length of 9,389 bp.
Repeated sequences were annotated with the Repeat Detector and the Ensembl Genomes repeat feature pipeline.There are: 1215333 Low complexity (Dust) features, covering 52 Mb (9.0% of the genome); 213484 RepeatMasker features (with the nrTEplants library), covering 79 Mb (13.8% of the genome); 473692 Tandem repeats (TRF) features, covering 37 Mb (6.5% of the genome).
- High-quality chromosome-scale assembly of the walnut (Juglans regia L.) reference genome.
Marrano A, Britton M, Zaini PA, Zimin AV, Workman RE, Puiu D, Bianco L, Pierro EAD, Allen BJ, Chakraborty S, Troggio M, Leslie CA, Timp W, Dandekar A, Salzberg SL, Neale DB.. - The walnut (Juglans regia) genome sequence reveals diversity in genes coding for the biosynthesis of non-structural polyphenols.
Martínez-García PJ, Crepeau MW, Puiu D, Gonzalez-Ibeas D, Whalen J, Stevens KA, Paul R, Butterfield TS, Britton MT, Reagan RL, Chakraborty S, Walawage SL, Vasquez-Gross HA, Cardeno C, Famula RA, Pratt K, Kuruganti S, Aradhya MK, Leslie CA, Dandekar AM, Salzberg SL, Wegrzyn JL, Langley CH, Neale DB.. - Deciphering of the Genetic Control of Phenology, Yield, and Pellicle Color in Persian Walnut (Juglans regia L.).
Marrano A, Sideli GM, Leslie CA, Cheng H, Neale DB.. - Synteny analysis in Rosids with a walnut physical map reveals slow genome evolution in long-lived woody perennials.
Luo MC, You FM, Li P, Wang JR, Zhu T, Dandekar AM, Leslie CA, Aradhya M, McGuire PE, Dvorak J..
Picture credit: Thecupermat [CC BY-SAT 3.0]
Links
Statistics
Summary
Assembly | Walnut_2.0, INSDC Assembly GCA_001411555.2, Jul 2020 |
Database version | 113.1 |
Golden Path Length | 572,786,133 |
Genebuild by | UC Davis |
Genebuild method | External annotation import |
Data source | UCDavis |
Gene counts
Coding genes | 40,487 |
Gene transcripts | 41,077 |