Oryza sativa Japonica Group (IRGSP-1.0)

Oryza sativa Japonica Group Assembly and Gene Annotation

About Oryza sativa Japonica

Oryza sativa Japonica (rice) is the staple food for 2.5 billion people. It is the grain with the second highest worldwide production after Zea mays. In addition to its agronomic importance, rice is an important model species for monocot plants and cereals such as maize, wheat, barley and sorghum. O. sativa has a compact diploid genome of approximately 500 Mb (n=12) compared with the multi-gigabase genomes of maize, wheat and barley.


Scientists from the MSU Rice Genome Annotation Project (MSU), the International Rice Genome Sequencing Project (IRGSP) and the Rice Annotation Project Database (RAP-DB) generated a unified assembly of the 12 rice pseudomolecules of Oryza sativa Japonica Group cv. Nipponbare.

The pseudomolecule for each chromosome was constructed by joining the nucleotide sequences of each PAC/BAC clone based on the order of the clones on the physical map. Overlapping sequences were removed and physical gaps were replaced with Ns. Updated pseudomolecules were constructed based on the original IRGSP sequence data in combination with a BAC-optical map and error correction using 44-fold coverage next generation sequencing read. The nucleotide sequences of seven new clones mapped on the euchromatin/telomere junctions were added in the new genome assembly. In addition, several clones in the centromere region of chromosome 5 were improved and one gap on chromosome 11 was closed.

Kawahara et al (2013) describe the integrated Os-Nipponbare-Reference-IRGSP-1.0 pseudomolecules, also known as MSU7. Gene loci, gene models and associated annotations were independently created by each group, but can be easily compared using the common reference.


International Rice Genome Sequencing Project (IRGSP) gene models were imported from the Rice Annotation Project (RAP-DB). The most recent update was from its 26th November 2018 release. This version corrected gene models with manual curation, also deprecated some bad models. In total, 35,666 protein-coding genes were included. Feature annotation and comparative analysis pipelines have been run and variations have been projected from the old annotation to the new one.

MSU-7 gene models were also loaded for visual comparison to the IRGSP set. Cross references between the two gene sets provided by RAP-DB allow searching and querying using either identifier space, but only the IRGSP/RAPDB models are used in our gene trees.


Probes from the Rice Genome Array for two rice cultivars were aligned to the genome.


Variation data from six different large scale studies are available:

  1. The 3000 Rice Genome Project (2015), an international effort to sequence the genomes of 3,024 rice varieties from 89 countries providing 365,710 variant loci (SNPs and InDels).
  2. Whole genome sequencing of 104 elite rice cultivars (Duitama et al. 2015), described as,
    "a comprehensive information resource for marker assisted selection
    " providing 25,769,548 variant loci.
  3. Chip based analysis of 1,310 SNPs across 395 samples (Zhao et al. 2010), described as,
    "revealing the impact of domestication and breeding on the rice genome
  4. Chip based analysis of approximately 160k SNPs across 20 diversity rice accessions (OryzaSNP, McNally et al. 2009), described as,
    "revealing relationships among landraces and modern varieties of rice
  5. The Oryza Map Alignment Project (OMAP 2007): approximately 1.6M variant loci detected by comparing BAC End Sequences from four rice varieties to Japonica. [dbSNP]
  6. Adaptive loss-of-function in domesticated rice (BGI 2004): A collection of approximately 3M variant loci from the comparison of the Indica (93-11) and Japonica (Nipponbare) genomes. [dbSNP]

The following genetic markers were remapped to the IRGSP-1.0 assembly by industry collaborator KeyGene:


  1. The map-based sequence of the rice genome.
    1. Nature. 436:793-800.
  2. Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data.
    Kawahara Y, de la Bastide M, Hamilton JP, Kanamori H, McCombie WR, Ouyang S, Schwartz DC, Tanaka T, Wu J, Zhou S et al. 2013. Rice (N Y). 6:4.
  3. Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics.
    Sakai H, Lee SS, Tanaka T, Numa H, Kim J, Kawahara Y, Wakimoto H, Yang CC, Iwamoto M, Abe T et al. 2013. Plant Cell Physiol.. 54:e6.
  4. The TIGR Rice Genome Annotation Resource: improvements and new features.
    Ouyang S, Zhu W, Hamilton J, Lin H, Campbell M, Childs K, Thibaud-Nissen F, Malek RL, Lee Y, Zheng L et al. 2007. Nucleic Acids Res.. 35:D883-7.
  5. Global analysis of gene expression using GeneChip microarrays.
    Zhu T. 2003. Curr. Opin. Plant Biol.. 6:418-425.
  6. The 3,000 rice genomes project.
    1. Gigascience. 3:7.
  7. Whole genome sequencing of elite rice cultivars as a comprehensive information resource for marker assisted selection.
    Duitama J, Silva A, Sanabria Y, Cruz DF, Quintero C, Ballen C, Lorieux M, Scheffler B, Farmer A, Torres E et al. 2015. PLoS ONE. 10:e0124617.
  8. Genomic diversity and introgression in O. sativa reveal the impact of domestication and breeding on the rice genome.
    Zhao K, Wright M, Kimball J, Eizenga G, McClung A, Kovach M, Tyagi W, Ali ML, Tung CW, Reynolds A et al. 2010. PLoS ONE. 5:e10780.
  9. Genomewide SNP variation reveals relationships among landraces and modern varieties of rice.
    McNally KL, Childs KL, Bohnert R, Davidson RM, Zhao K, Ulat VJ, Zeller G, Clark RM, Hoen DR, Bureau TE et al. 2009. Proc. Natl. Acad. Sci. U.S.A.. 106:12273-12278.
  10. The Genomes of Oryza sativa: a history of duplications.
    Yu J, Wang J, Lin W, Li S, Li H, Zhou J, Ni P, Dong W, Hu S, Zeng C et al. 2005. PLoS Biol.. 3:e38.

More information

General information about this species can be found in Wikipedia.



AssemblyIRGSP-1.0, INSDC Assembly GCA_001433935.1, Oct 2015
Database version111.7
Golden Path Length375,049,285
Genebuild byRAP-DB
Genebuild methodImport
Data sourceIRGSP

Gene counts

Coding genes35,806
Non coding genes3,180
Small non coding genes3,085
Long non coding genes95
Gene transcripts45,973


FGENESH gene prediction46,238
TE-related Gene (MSU)17,272
Short Variants28,179,246
Structural variants1,278