Oryza sativa Japonica Group Assembly and Gene Annotation

About Oryza sativa Japonica

Oryza sativa Japonica (rice) is the staple food for 2.5 billion people. It is the grain with the second highest worldwide production after Zea mays. In addition to its agronomic importance, rice is an important model species for monocot plants and cereals such as maize, wheat, barley and sorghum. O. sativa has a compact diploid genome of approximately 500 Mb (n=12) compared with the multi-gigabase genomes of maize, wheat and barley.

Assembly

Scientists from the MSU Rice Genome Annotation Project (MSU), the International Rice Genome Sequencing Project (IRGSP) and the Rice Annotation Project Database (RAP-DB) generated a unified assembly of the 12 rice pseudomolecules of Oryza sativa Japonica Group cv. Nipponbare.

The pseudomolecule for each chromosome was constructed by joining the nucleotide sequences of each PAC/BAC clone based on the order of the clones on the physical map. Overlapping sequences were removed and physical gaps were replaced with Ns. Updated pseudomolecules were constructed based on the original IRGSP sequence data in combination with a BAC-optical map and error correction using 44-fold coverage next generation sequencing read. The nucleotide sequences of seven new clones mapped on the euchromatin/telomere junctions were added in the new genome assembly. In addition, several clones in the centromere region of chromosome 5 were improved and one gap on chromosome 11 was closed.

Kawahara et al (2013) describe the integrated Os-Nipponbare-Reference-IRGSP-1.0 pseudomolecules, also known as MSU7. Gene loci, gene models and associated annotations were independently created by each group, but can be easily compared using the common reference.

Annotation

International Rice Genome Sequencing Project (IRGSP) gene models were imported from the Rice Annotation Project (RAP-DB). The most recent update was from its 26th November 2018 release. This version corrected gene models with manual curation, also deprecated some bad models. In total, 35,666 protein-coding genes were included. Feature annotation and comparative analysis pipelines have been run and variations have been projected from the old annotation to the new one.

MSU-7 gene models were also loaded for visual comparison to the IRGSP set. Cross references between the two gene sets provided by RAP-DB allow searching and querying using either identifier space, but only the IRGSP/RAPDB models are used in our gene trees.

Regulation

Probes from the Rice Genome Array for two rice cultivars were aligned to the genome.

Variation

Variation data from six different large scale studies are available:

~ 62 million variants were loaded from the 3000 Rice Genome Project (2015), an international effort to sequence the genomes of 3,024 rice varieties from 89 countries.
Whole genome sequencing of 104 elite rice cultivars (Duitama et al. 2015), described as,
"a comprehensive information resource for marker assisted selection
" providing 25,769,548 variant loci.
Chip based analysis of 1,310 SNPs across 395 samples (Zhao et al. 2010), described as,
"revealing the impact of domestication and breeding on the rice genome
".
Chip based analysis of approximately 160k SNPs across 20 diversity rice accessions (OryzaSNP, McNally et al. 2009), described as,
"revealing relationships among landraces and modern varieties of rice
".
The Oryza Map Alignment Project (OMAP 2007): approximately 1.6M variant loci detected by comparing BAC End Sequences from four rice varieties to Japonica. [dbSNP]
Adaptive loss-of-function in domesticated rice (BGI 2004): A collection of approximately 3M variant loci from the comparison of the Indica (93-11) and Japonica (Nipponbare) genomes. [dbSNP]

The following genetic markers were remapped to the IRGSP-1.0 assembly by industry collaborator KeyGene:

20,483 Quantitative Trait Locus (QTL): 19,435 from Gramene's legacy QTLs database and 1,048 from the Q-Taro database
1,278 genetic markers (990 RFLPs and 288 SSRs) from Gramene's legacy markers database

References

The map-based sequence of the rice genome.
1. Nature. 436:793-800.
Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data.
Kawahara Y, de la Bastide M, Hamilton JP, Kanamori H, McCombie WR, Ouyang S, Schwartz DC, Tanaka T, Wu J, Zhou S et al. 2013. Rice (N Y). 6:4.
Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics.
Sakai H, Lee SS, Tanaka T, Numa H, Kim J, Kawahara Y, Wakimoto H, Yang CC, Iwamoto M, Abe T et al. 2013. Plant Cell Physiol.. 54:e6.
The TIGR Rice Genome Annotation Resource: improvements and new features.
Ouyang S, Zhu W, Hamilton J, Lin H, Campbell M, Childs K, Thibaud-Nissen F, Malek RL, Lee Y, Zheng L et al. 2007. Nucleic Acids Res.. 35:D883-7.
Global analysis of gene expression using GeneChip microarrays.
Zhu T. 2003. Curr. Opin. Plant Biol.. 6:418-425.
The 3,000 rice genomes project.
1. Gigascience. 3:7.
Whole genome sequencing of elite rice cultivars as a comprehensive information resource for marker assisted selection.
Duitama J, Silva A, Sanabria Y, Cruz DF, Quintero C, Ballen C, Lorieux M, Scheffler B, Farmer A, Torres E et al. 2015. PLoS ONE. 10:e0124617.
Genomic diversity and introgression in O. sativa reveal the impact of domestication and breeding on the rice genome.
Zhao K, Wright M, Kimball J, Eizenga G, McClung A, Kovach M, Tyagi W, Ali ML, Tung CW, Reynolds A et al. 2010. PLoS ONE. 5:e10780.
Genomewide SNP variation reveals relationships among landraces and modern varieties of rice.
McNally KL, Childs KL, Bohnert R, Davidson RM, Zhao K, Ulat VJ, Zeller G, Clark RM, Hoen DR, Bureau TE et al. 2009. Proc. Natl. Acad. Sci. U.S.A.. 106:12273-12278.
The Genomes of Oryza sativa: a history of duplications.
Yu J, Wang J, Lin W, Li S, Li H, Zhou J, Ni P, Dong W, Hu S, Zeng C et al. 2005. PLoS Biol.. 3:e38.

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

Assembly	IRGSP-1.0, INSDC Assembly GCA_001433935.1, Oct 2015
Database version	115.7
Golden Path Length	375,049,285
Genebuild by	RAP-DB
Genebuild method	Import
Data source	IRGSP

Gene counts

Gene/transcipt that contains an open reading frame (ORF).Coding genes	35,806
Non coding genes	3,180
Small non coding genes	3,085
Long non coding genes	95
A gene that has homology to known protein-coding genes but contain a frameshift and/or stop codon(s) which disrupts the ORF. Thought to have arisen through duplication followed by loss of function.Pseudogenes	7
A transcript is the operational unit of a gene. In a genomic context, transcripts consist of one or more exons, with adjoining exons being separated by introns. The exons/introns are transcribed and then the introns spliced out. Transcripts may or may not encode a proteinGene transcripts	45,973

Other

FGENESH gene prediction	46,238
TE-related Gene (MSU)	17,272
Short Variants	28,179,246
Structural variants	1,278

Oryza sativa Japonica Group Assembly and Gene Annotation

About Oryza sativa Japonica

Assembly

Annotation

Regulation

Variation

References

Links

More information

Statistics

Summary

Gene counts

Other

About Us

Get help

Our sister sites

Follow us

Favourite species

All species

Oryza sativa Japonica Group Assembly and Gene Annotation

About Oryza sativa Japonica

Assembly

Annotation

Regulation

Variation

References

Links

More information

Statistics

Summary

Gene counts

Other

About Us

Get help

Our sister sites

Follow us