Chenopodium quinoa Assembly and Gene Annotation
About Chenopodium quinoa
Chenopodium quinoa (Quinoa) is a highly nutritious crop that is adapted to thrive in a wide range of agroecosystems. It was presumably first domesticated more than 7,000 years ago by pre-Columbian cultures and was known as the Inca ‘mother grain’. It is an allotetraploid (2n=4x=36). Quinoa has adapted to the high plains of the Andean Altiplano (>3,500 m above sea level), where it has developed tolerance to several abiotic stresses. Quinoa has gained international attention because of the nutritional value of its seeds, which are gluten-free, have a low glycaemic index, and contain an excellent balance of essential amino acids, fibre, lipids, carbohydrates, vitamins, and minerals. It has the potential to provide a highly nutritious food source that can be grown on marginal lands not currently suitable for other major crops. This genome corresponds to coastal Chilean quinoa accession PI 614886, also known as NSL 106399 and QQ74.
Assembly
DNA extracted from leaf and flower tissue of a single plant was sequenced and assembled using single-molecule real-time technology from Pacific Biosciences and optical and chromosome-contact maps from BioNano Genomics and Dovetail Genomics. The assembly contains 3,486 scaffolds, with a scaffold N50 of 3.84 Mb and 90% of the assembled genome contained in 439 scaffolds. The total assembly size of 1.39Gb is similar to the reported size estimates of the quinoa genome (1.45–1.50 Gb). To combine scaffolds into pseudomolecules, an existing linkage map from quinoa was integrated with two new linkage maps. The resulting map of 6,403 unique markers spans a total length of 2,034 cM and consists of 18 linkage groups, corresponding to the haploid chromosome number of quinoa.
Annotation
Protein-coding genes were annotated using a combination of ab initio prediction and transcript evidence gathered from RNA sequenced from multiple tissues using both RNA-seq and PacBio isoform sequencing approaches. The obtained number of gene models is in line with sequenced tetraploid species. A majority (97.3%) of the 956 genes in the Plantae BUSCO dataset were identified in the annotation, which is suggestive of a complete assembly and annotation.
Repeats were annotated with the Ensembl Genomes repeat feature pipeline.There are: 2286474 Low complexity (Dust) features, covering 99 Mb (7.4% of the genome); 331989 RepeatMasker features (with the nrTEplants library), covering 142 Mb (10.7% of the genome); 988319 Tandem repeats (TRF) features, covering 314 Mb (23.5% of the genome); Repeat Detector repeats length 820Mb (61.5% of the genome).
References
- The genome of Chenopodium quinoa.
Jarvis DE, Ho YS, Lightfoot DJ, Schmöckel SM, Li B, Borm TJ, Ohyanagi H, Mineta K, Michell CT, Saber N, Kharbatia NM, Rupper RR, Sharp AR, Dally N, Boughton BA, Woo YH, Gao G, Schijlen EG, Guo X, Momin AA, Negrão S, Al-Babili S, Gehring C, Roessner U, Jung C, Murphy K, Arold ST, Gojobori T, Linden CG, van Loo EN, Jellen EN, Maughan PJ, Tester M..
Picture credit: Michael Hermann CC BY-SA 4.0
More information
General information about this species can be found in Wikipedia.
Statistics
Summary
Assembly | ASM168347v1, INSDC Assembly GCA_001683475.1, |
Database version | 113.1 |
Golden Path Length | 1,333,398,936 |
Genebuild by | ChenopodiumDB |
Genebuild method | External annotation import |
Data source | King Abdullah University of Science and Technology |
Gene counts
Coding genes | 43,952 |
Gene transcripts | 43,952 |