Glycine max Assembly and Gene Annotation

About Glycine max

Glycine max (soybean) is a crop legume that globally constitutes one of the most important sources of animal feed protein and cooking oil. Having originated in East Asia soy is now cultivated world-wide with greatest production in the U.S. Though only a minor proportion of the crop is eaten directly by humans, soybean is a valuable source of protein, containing all essential amino acids, and frequently used as a dietary substitute for meat. Like other legumes, soybean is able to fix atmospheric nitrogen by engaging in a symbiotic relationship with microbial organisms. The complete sequence of the soybean genome not only impacts research and breeding of this crop, but also serves as a reference for genomics research in other legumes. Representing the order Fabales within the eudicot taxonomy, the sequence will also advance research in comparative phylogenomics. As a paleopolyploid, the soybean genome shows evidence of two ancient whole genome duplications, one early in the legume lineage and a second more recent event specific to the soybean lineage. The soybean genome has 20 chromosomes and an estimated size of 1,115 Mb.

Assembly

Glycine max var. Williams 82 was sequenced, assembled, and annotated by the U.S. DOE Joint Genome Institute (JGI-PGF) in collaboration with a consortium of research labs and published. Current assembly is version v2.1 and comprises a total 978 Mb (GCA_000004515.4{.external-link}).

Annotation

The Wm82.a2.v1 gene set integrates ~1.6 million ESTs, some 454 ESTs and 1.5 billion paired-end Illumina RNA-seq reads with homology-based gene predictions. A total of 56,044 protein-coding loci (see JGI Phytozome for additional details).

Variation

Variation data from the European Variation Archive was added.

The data is from a project conducted to investigate selection signatures in soybean subgroups. 112 Glycine soja and 133 Glycine max accessions were re-sequenced, and numerous variants for the 245 accessions were obtained through variant calling process using GATK. Then, to avoid as false-positive variants as much possible, variant-quality-filtering and allele-frequency-filtering processes were conducted to the obtained variants. Finally, 9,650,073 bi-allelic SNP variants with minor allele frequency > 1% were identified for the 245 soybean accessions and utilized to this project [3].

References

Genome sequence of the palaeopolyploid soybean.
Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, Song Q, Thelen JJ, Cheng J et al. 2010. Nature. 463:178-183.
Image credit: USDA.
Dissection of soybean populations according to selection signatures based on whole-genome sequences.
Jae-Yoon Kim, Seongmun Jeong, Kyoung Hyoun Kim, Won-Jun Lim, Ho-Yeon Lee, Namhee Jeong, Jung-Kyung Moon, Namshin Kim. GigaScience, Volume 8, Issue 12, 2019.

Picture credit: Wikimedia Commons, the free media repository

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

Assembly	Glycine_max_v2.1, INSDC Assembly GCA_000004515.4, Jul 2018
Database version	114.4
Golden Path Length	978,491,270
Genebuild by	JGI
Genebuild method	Import
Data source	Joint Genome Institute

Gene counts

Gene/transcipt that contains an open reading frame (ORF).Coding genes	55,897
Non coding genes	1,250
Small non coding genes	1,241
Long non coding genes	9
A transcript is the operational unit of a gene. In a genomic context, transcripts consist of one or more exons, with adjoining exons being separated by introns. The exons/introns are transcribed and then the introns spliced out. Transcripts may or may not encode a proteinGene transcripts	89,662

Glycine max Assembly and Gene Annotation

About Glycine max

Assembly

Annotation

Variation

References

Links

More information

Statistics

Summary

Gene counts

About Us

Get help

Our sister sites

Follow us

Favourite species

All species

Glycine max Assembly and Gene Annotation

About Glycine max

Assembly

Annotation

Variation

References

Links

More information

Statistics

Summary

Gene counts

About Us

Get help

Our sister sites

Follow us