Oryza glaberrima Assembly and Gene Annotation

Project funding: National Science Foundation Plant Genome Research Program (#082224) awarded to R. Wing, S. Rounsley and Y. Yu. This genome falls under the scope of the I-OMAP (International Oryza Map Alignment Project) consortium. The I-OMAP consortium is an internationally coordinated effort to create high-quality reference assemblies representing the diversity of wild and crop-progenitor species in the genus Oryza (Jacquemin et al, 2012).

About Oryza glaberrima

Oryza glaberrima (African rice) is a cultivated grain distinct from its better known cousin Oryza sativa (Asian rice). African rice was independently domesticated ~3000 years ago in the Niger River Delta from its still extant progenitor, Oryza barthii. While lacking many of the agronomic and quality traits found in Asian rice, O. glaberrima is significant for its resistance to many pests and diseases and for its tolerance of drought and infertile soils. Interspecific crosses between African and Asian rice have produced cultivars with improved yield and quality traits, that have been adopted by many African countries to meet the growing need for rice as a staple food. From a scientific perspective the genome of O. glaberrima provides insight into the genetic basis of domestication and other traits by finding commonalities and differences with O. sativa. Similar to Asian rice, African rice is a diploid A-type genome, having 12 chromosomes and an estimated size of ~358 Mbp.


The genome sequence was generated and assembled by the Arizona Genomics Institute (AGI) using strain IRGC:96717. The current assembly is "Oryza_glaberrima_AGI1.1". It incorporates the previously assembled chromosome 3 short arm (Chr3s) sequence and consists of 12 chromosome pseudomolecules and 1,939 unplaced scaffolds. Chr3s was sequenced and assembled using a heavily manually edited physical map. BAC clones were shotgun Sanger sequenced to 8x coverage and phase II finished. Assembly of the tile sequence was performed manually. The rest of the genome was sequenced with a hybrid BAC pooling and whole genome shotgun approach with 30x coverage of Roche GSFLX 454 Titanium sequencing technology. Sequences were assembled and combined with a subset of previously sequenced BAC clones to produce a whole genome assembly. The underlying scaffolds have been deposited in GenBank with the accession number ADWL01000000.


Protein-coding genes were annotated by the Munich Information Center for Protein Sequences (MIPS) led by Klaus Meyer using an evidence-based approach. Annotation of repeats and transposable elements was conducted at AGI. Prediction of ncRNA and tRNA genes was conducted at AGI.


Variation data comes from two (unpublised) sources:

  1. 20 diverse accessions of Oryza glaberrima and
  2. 19 accessions of its wild progenitor, Oryza barthii, collected from geographically distributed regions of Africa.

Briefly, WGS reads were generated using low-coverage Illumina sequencing. Filtered reads were aligned to O. glaberrima using BWA and SNP calling was done using a combination of SAMtools and GATK with standard quality and coverage filters giving a final set of ~8M SNPs.

These unpublished data were kindly contributed by Rod Wing of the Arizona Genomics Institute and collaborator Carlos Machado of the University of Maryland, as part of the Oryza Genome Evolution project funded by NSF Award #1026200.

Gramene/Ensembl Genomes Annotation

Additional annotations generated by the Gramene project include:

  • Gene phylogenetic trees with other Gramene species, see example.
  • Lastz Whole Genome Alignment to Arabidopsis thaliana, Oryza sativa Japonica (IRGSP v1), and a few AA oryza genomes, see example.
  • Ortholog based DAGchainer synteny detection against other AA genomes, see example.
  • Mapping to the genome of multiple sequence-based feature sets using gramene blat pipeline, see example.
  • Identification of various repeat features from MIPS and AGI repeat libraries.
  • Variation effect prediction with sequence ontology, for example.



  1. Genetic diversity and domestication history of African rice (Oryza glaberrima) as inferred from multiple gene sequences.
    Li ZM, Zheng XM, Ge S. 2011. Theor. Appl. Genet.. 123:21-31.
  2. Rice structural variation: a comparative analysis of structural variation between rice and three of its closest relatives in the genus Oryza.
    Hurwitz BL, Kudrna D, Yu Y, Sebastian A, Zuccolo A, Jackson SA, Ware D, Wing RA, Stein L. 2010. Plant J.. 63:990-1003.
  3. Patterns of sequence divergence and evolution of the S orthologous regions between Asian and African cultivated rice species.
    Guyot R, Garavito A, Gavory F, Samain S, Tohme J, Ghesquire A, Lorieux M. 2011. PLoS ONE. 6:e17726.
  4. Exceptional lability of a genomic complex in rice and its close relatives revealed by interspecific and intraspecific comparison and population analysis.
    Tian Z, Yu Y, Lin F, Yu Y, Sanmiguel PJ, Wing RA, McCouch SR, Ma J, Jackson SA. 2011. BMC Genomics. 12:142.
  5. Distinct evolutionary patterns of Oryza glaberrima deciphered by genome sequencing and comparative analysis.
    Sakai H, Ikawa H, Tanaka T, Numa H, Minami H, Fujisawa M, Shibata M, Kurita K, Kikuta A, Hamada M et al. 2011. Plant J.. 66:796-805.
  6. Orthologous comparisons of the Hd1 region across genera reveal Hd1 gene lability within diploid Oryza species and disruptions to microsynteny in Sorghum.
    Sanyal A, Ammiraju JS, Lu F, Yu Y, Rambo T, Currie J, Kollura K, Kim HR, Chen J, Ma J et al. 2010. Mol. Biol. Evol.. 27:2487-2506.
  7. Paleogenomic analysis of the short arm of chromosome 3 reveals the history of the African and Asian progenitors of cultivated rices.
    Roulin A, Chaparro C, Pigu B, Jackson S, Panaud O. 2010. Genome Biol Evol. 2:132-139.

More information

General information about this species can be found in Wikipedia.



AssemblyAGI1.1, INSDC Assembly GCA_000147395.1, May 2011
Database version90.2
Base Pairs316,419,574
Golden Path Length316,419,574
Genebuild byAGI
Genebuild methodImported from MIPS
Data sourceAGI

Gene counts

Coding genes33,164
Non coding genes41,415
Small non coding genes36,627
Long non coding genes219
Misc non coding genes4,569
Gene transcripts74,579


FGENESH gene prediction27,943
Short Variants7,704,409

About this species