Musa acuminata Assembly and Gene Annotation

About Musa acuminata

Musa acuminata (banana, 2n=22) is native to tropical South and Southeast Asia and is cultivated throughout the tropics. Grown primarily for its fruit, rich in starch, banana is the most popular fruit in industrialised countries. Cultivars mainly involve Musa acuminata (A-genome) and Musa balbisiana (B-genome) and are sometimes diploid but generally triploid. Banana was the first non-grass monocotyledon to be sequenced, making it an important genome for the comparative analysis of plants. This sequence corresponds to a doubled-haploid plant of cultivar Pahang.

Assembly

The original genome assembly draft of Musa acuminata ssp. malaccensis doubled-haploid [2], generated by the Global Musa Genomics Consortium, led jointly by the Alliance Bioversity-CIRAD and Genoscope, was improved in successive steps [3].

The original 24,425 contigs were re-assembled into scaffolds exploiting paired end (PE) data and new 5 kb mate-pair illumina sequences (40x coverage). Contigs were assembled into 2,267 scaffolds for a cumulated size of 439 Mb representing 84% of the estimated size (523 Mb) of the DH-Pahang genome.

Thirty six misassembled regions identified in 33 scaffolds split, resulting in a total of 2,303 scaffolds. Based on discordant paired-reads, a total of 438 scaffold fusions and 293 scaffold junctions were performed. A total of 9,838 gap regions were filled using 330bp PE libraries (50x).

The final assembly consisted of 1,532 scaffolds and showed a cumulative size of 450.7 Mb corresponding to 86% of the estimated genome size. Ninety percent of the assembly was in 267 scaffolds and the N50 was 3.0 Mb. Gaps in scaffolds represent 45.2 Mb (10.0% of the assembly).

Genetic markers were then used to assembled scaffolds into pseudo-molecules. A total of 21,603 markers that mapped to a unique position were grouped into 11 linkage groups or pseudo-molecules, with an average of 5.44 markers per 100 kb.

The final assembly contains the 11 chromosomes (+ 1 ChrUnk with unplaced contigs concatenated) and the mitochondrial genome.

Annotation

Two independent annotations of the initial version of the banana genome assembly were available and both were transferred to the new assembly. The M. acuminata transcripts from the first annotation [2] in addition to several manually curated gene annotation were transferred to the new assembly version. The same transfer was performed for the NCBI Refseq annotation.

Based on the analysis of several manually curated genes, the NCBI RefSeq genome annotation proved to be generally of better quality than the first published annotation. In addition, the NCBI RefSeq genome annotation integrated RNAseq data and predicted alternative transcripts. We thus created a consensus annotation that combined all the manually curated genes, the NCBI Refseq annotation and the predicted genes from the first annotation that were missed by the Refseq annotation pipeline. The consensus annotation contains 35,276 predicted genes with 34,629 (98.2 %) located in chromosomes. Locus nomenclature was modified to avoid confusion; for example, GSMUA_Achr5t02570_001 in version 1 becomes Ma05_t02680.1 in the current annotation. A gene id converter is available at https://banana-genome-hub.southgreen.fr

References

Image credit: Telrnya (Own work) CC-BY-SA-3.0
The banana (Musa acuminata) genome and the evolution of monocotyledonous plants.
D'Hont A, Denoeud F, Aury JM, Baurens FC, Carreel F, Garsmeur O, Noel B, Bocs S, Droc G, Rouard M, Da Silva C, Jabbari K, Cardi C, Poulain J, Souquet M, Labadie K, Jourda C, Lengellé J, Rodier-Goud M, Alberti A, Bernard M, Correa M, Ayyampalayam S, Mckain MR, Leebens-Mack J, Burgess D, Freeling M, Mbéguié-A-Mbéguié D, Chabannes M, Wicker T, Panaud O, Barbosa J, Hribova E, Heslop-Harrison P, Habas R, Rivallan R, Francois P, Poiron C, Kilian A, Burthia D, Jenny C, Bakry F, Brown S, Guignon V, Kema G, Dita M, Waalwijk C, Joseph S, Dievart A, Jaillon O, Leclercq J, Argout X, Lyons E, Almeida A, Jeridi M, Dolezel J, Roux N, Risterucci AM, Weissenbach J, Ruiz M, Glaszmann JC, Quétier F, Yahiaoui N, Wincker P..
Improvement of the banana "Musa acuminata" reference sequence using NGS data and semi-automated bioinformatics methods.
Martin G, Baurens FC, Droc G, Rouard M, Cenci A, Kilian A, Hastie A, Doležel J, Aury JM, Alberti A, Carreel F, D'Hont A..

Statistics

Summary

Assembly	Musa_acuminata_v2, INSDC Assembly GCA_904845865.1, Oct 2020
Database version	114.2
Golden Path Length	450,848,473
Genebuild by	Banana Genome Hub v2
Genebuild method	External annotation import
Data source	Genoscope/IG/CEA

Gene counts

Gene/transcipt that contains an open reading frame (ORF).Coding genes	35,275
A transcript is the operational unit of a gene. In a genomic context, transcripts consist of one or more exons, with adjoining exons being separated by introns. The exons/introns are transcribed and then the introns spliced out. Transcripts may or may not encode a proteinGene transcripts	45,855

Musa acuminata Assembly and Gene Annotation

About Musa acuminata

Assembly

Annotation

References

Statistics

Summary

Gene counts

About Us

Get help

Our sister sites

Follow us

Favourite species

All species

Musa acuminata Assembly and Gene Annotation

About Musa acuminata

Assembly

Annotation

References

Statistics

Summary

Gene counts

About Us

Get help

Our sister sites

Follow us