Oryza sativa Japonica Group Assembly and Gene Annotation
About Oryza sativa Japonica
Oryza sativa Japonica (rice) is the staple food for 2.5 billion people. It is the grain with the second highest worldwide production after Zea mays. In addition to its agronomic importance, rice is an important model species for monocot plants and cereals such as maize, wheat, barley and sorghum. O. sativa has a compact diploid genome of approximately 500 Mb (n=12) compared with the multi-gigabase genomes of maize, wheat and barley.
Assembly
Scientists from the MSU Rice Genome Annotation Project (MSU), the International Rice Genome Sequencing Project (IRGSP) and the Rice Annotation Project Database (RAP-DB) generated a unified assembly of the 12 rice pseudomolecules of Oryza sativa Japonica Group cv. Nipponbare.
The pseudomolecule for each chromosome was constructed by joining the nucleotide sequences of each PAC/BAC clone based on the order of the clones on the physical map. Overlapping sequences were removed and physical gaps were replaced with Ns. Updated pseudomolecules were constructed based on the original IRGSP sequence data in combination with a BAC-optical map and error correction using 44-fold coverage next generation sequencing read. The nucleotide sequences of seven new clones mapped on the euchromatin/telomere junctions were added in the new genome assembly. In addition, several clones in the centromere region of chromosome 5 were improved and one gap on chromosome 11 was closed.
Kawahara et al (2013) describe the integrated Os-Nipponbare-Reference-IRGSP-1.0 pseudomolecules, also known as MSU7. Gene loci, gene models and associated annotations were independently created by each group, but can be easily compared using the common reference.
Annotation
International Rice Genome Sequencing Project (IRGSP) gene models were imported from the Rice Annotation Project (RAP-DB). The most recent update was from its 26th November 2018 release. This version corrected gene models with manual curation, also deprecated some bad models. In total, 35,666 protein-coding genes were included. Feature annotation and comparative analysis pipelines have been run and variations have been projected from the old annotation to the new one.
MSU-7 gene models were also loaded for visual comparison to the IRGSP set. Cross references between the two gene sets provided by RAP-DB allow searching and querying using either identifier space, but only the IRGSP/RAPDB models are used in our gene trees.
Regulation
Probes from the Rice Genome Array for two rice cultivars were aligned to the genome.
Variation
Variation data from six different large scale studies are available:
- The 3000 Rice Genome Project (2015), an international effort to sequence the genomes of 3,024 rice varieties from 89 countries providing 365,710 variant loci (SNPs and InDels).
- Whole genome sequencing of 104 elite rice cultivars (Duitama et
al. 2015), described
as,
"a comprehensive information resource for marker assisted selection
" providing 25,769,548 variant loci. - Chip based analysis of 1,310 SNPs across 395 samples (Zhao et al.
2010), described as,
"revealing the impact of domestication and breeding on the rice genome
". - Chip based analysis of approximately 160k SNPs across 20 diversity
rice accessions (OryzaSNP, McNally et
al. 2009), described
as,
"revealing relationships among landraces and modern varieties of rice
". - The Oryza Map Alignment Project (OMAP 2007): approximately 1.6M variant loci detected by comparing BAC End Sequences from four rice varieties to Japonica. [dbSNP]
- Adaptive loss-of-function in domesticated rice (BGI 2004): A collection of approximately 3M variant loci from the comparison of the Indica (93-11) and Japonica (Nipponbare) genomes. [dbSNP]
The following genetic markers were remapped to the IRGSP-1.0 assembly by industry collaborator KeyGene:
- 20,483 Quantitative Trait Locus (QTL): 19,435 from Gramene's legacy QTLs database and 1,048 from the Q-Taro database
- 1,278 genetic markers (990 RFLPs and 288 SSRs) from Gramene's legacy markers database
References
- The map-based sequence of the rice
genome.
- Nature. 436:793-800.
- Improvement of the Oryza sativa Nipponbare reference genome using
next generation sequence and optical map
data.
Kawahara Y, de la Bastide M, Hamilton JP, Kanamori H, McCombie WR, Ouyang S, Schwartz DC, Tanaka T, Wu J, Zhou S et al. 2013. Rice (N Y). 6:4. - Rice Annotation Project Database (RAP-DB): an integrative and
interactive database for rice
genomics.
Sakai H, Lee SS, Tanaka T, Numa H, Kim J, Kawahara Y, Wakimoto H, Yang CC, Iwamoto M, Abe T et al. 2013. Plant Cell Physiol.. 54:e6. - The TIGR Rice Genome Annotation Resource: improvements and new
features.
Ouyang S, Zhu W, Hamilton J, Lin H, Campbell M, Childs K, Thibaud-Nissen F, Malek RL, Lee Y, Zheng L et al. 2007. Nucleic Acids Res.. 35:D883-7. - Global analysis of gene expression using GeneChip
microarrays.
Zhu T. 2003. Curr. Opin. Plant Biol.. 6:418-425. - The 3,000 rice genomes
project.
- Gigascience. 3:7.
- Whole genome sequencing of elite rice cultivars as a comprehensive
information resource for marker assisted
selection.
Duitama J, Silva A, Sanabria Y, Cruz DF, Quintero C, Ballen C, Lorieux M, Scheffler B, Farmer A, Torres E et al. 2015. PLoS ONE. 10:e0124617. - Genomic diversity and introgression in O. sativa reveal the impact
of domestication and breeding on the rice
genome.
Zhao K, Wright M, Kimball J, Eizenga G, McClung A, Kovach M, Tyagi W, Ali ML, Tung CW, Reynolds A et al. 2010. PLoS ONE. 5:e10780. - Genomewide SNP variation reveals relationships among landraces and
modern varieties of
rice.
McNally KL, Childs KL, Bohnert R, Davidson RM, Zhao K, Ulat VJ, Zeller G, Clark RM, Hoen DR, Bureau TE et al. 2009. Proc. Natl. Acad. Sci. U.S.A.. 106:12273-12278. - The Genomes of Oryza sativa: a history of
duplications.
Yu J, Wang J, Lin W, Li S, Li H, Zhou J, Ni P, Dong W, Hu S, Zeng C et al. 2005. PLoS Biol.. 3:e38.
Links
- Gramene species page for Oryza
- International Rice Genome Sequencing Consortium (IRGSP)
- The Rice Annotation Project Database (RAP-DB)
- MSU Rice Genome Annotation Project
More information
General information about this species can be found in Wikipedia.
Statistics
Summary
Assembly | IRGSP-1.0, INSDC Assembly GCA_001433935.1, Oct 2015 |
Database version | 113.7 |
Golden Path Length | 375,049,285 |
Genebuild by | RAP-DB |
Genebuild method | Import |
Data source | IRGSP |
Gene counts
Coding genes | 35,806 |
Non coding genes | 3,180 |
Small non coding genes | 3,085 |
Long non coding genes | 95 |
Pseudogenes | 7 |
Gene transcripts | 45,973 |
Other
FGENESH gene prediction | 46,238 |
TE-related Gene (MSU) | 17,272 |
Short Variants | 28,179,246 |
Structural variants | 1,278 |