Vicia faba (Hedin2_genome_v1)

Vicia faba Assembly and Gene Annotation

About Vicia Faba

Faba bean (Vicia faba L., 2n = 12) was domesticated in the near East more than 10,000 years BP and its broad adaptability, value as a restorative crop in rotations and high nutritional density have propelled it to the status of a global crop grown on all continents except Antarctica. Faba bean continues to be relevant in the twenty-first century as humanity strives to lower agricultural greenhouse gas emissions by replacing meat or milk protein with plant-based alternatives. It is the highest yielding of all grain legumes and has a favourable protein content (approximately 29%) compared with other cool-season pulses such as pea, lentil and chickpea, making it a suitable candidate to meet challenging projected future protein demands. Furthermore, the high biological nitrogen fixation rates of faba bean and the long duration of nectar-rich, pollinator-friendly flowers provide important ecosystem services, which means that cultivation of faba bean is increasingly seen as key for sustainable intensification strategies.

Assembly

The assembly is created by the FBGC (Faba Bean Genome Consortium). The 13-Gb faba bean genome (2n = 2x = 12) is one of the largest diploid field crops. The genome was sequenced with PacBio HiFi long reads to 20-fold coverage and assembled 11.9 Gb of sequence, more than half of which was represented by contigs longer than 2.7 Mb. The inbred line ‘Hedin/2’ was choosen as a reference genotype owing to its high autofertility and productivity, combined with an early maturing spring habit and exceptional degree of homozygosity. Linkage information afforded by a genetic map and chromosome conformation capture sequencing (Hi-C) data placed 11.2 Gb (94%) into chromosomal pseudomolecules. The biggest of its six chromosomes holds the equivalent of an entire human genome and its dominant repeat family members are longer (up to 25 kb) than those in similarly sized polyploid cereal genomes.

Annotation

The genome sequence of Hedin/2 was annotated by the FBGC using RNA sequencing data from nine diverse tissues, resulting in a total of 34,221 protein-coding genes. The predicted Hedin/2 gene models captured 96% of single-copy orthologues conserved in Embryophyta according to the BUSCO metric. Gene density was uniform along the chromosomes (except for the positions of satellite DNA arrays) without the proximal–distal gradient typically observed for grass chromosomes. Meiotic recombination displayed a similar distribution with an average of 27 genes per centimorgan. Thus, despite its large genome, faba bean may be more amenable to genetic mapping than cereals, in which up to one-third of genes are locked in non-recombining pericentric regions. Gene order was highly collinear and syntenic with other legumes. To further validate gene annotation, 262 Medicago truncatula genes related to symbiosis with rhizobia or arbuscular mycorrhizal fungi were aligned and found putative orthologues for them all. In addition, using RNA sequencing, verification was done on a large subset of these genes that was responsive to inoculation, as expected.

In contrast to gymnosperms, with similarly gigantic genomes, introns in faba bean genes were not larger than in angiosperms with smaller genomes, but the intergenic space was more expanded. Moreover, the absence of a lineage-specific whole-genome duplication (WGD) or widespread gene family expansion means that the proliferation of repeat elements largely explains why the faba genome is more than seven times larger than that of its close relative common vetch (Vicia sativa).

Approximately 79% of the Hedin/2 assembly was annotated as transposon-derived. By far, the largest group is the LTR retrotransposons (RLX), accounting for 63.7% of the genome sequence. Other groups of TEs represent only minor fractions of the genome. Among the RLX, those of the Gypsy (RLG) superfamily outnumber Copia (RLC) elements by more than 2:1. The Ogre family of Gypsy elements alone make up almost half (44%) of the genome, confirming its status as a major determinant of genome size in the Fabaceae. The great length of individual elements (up to 35 kb for Ogre and 32 kb of SIRE, the longest and second-longest elements, respectively), together with their abundance, partially explains the large size of the faba bean genome. In addition, a large and diverse set of satellite repeat families that differ in their monomer sequences and genome abundance accounted for 9.4% of the total assembly length, with the most abundant satellite family FokI representing 4% (0.475 Gb). FokI, together with several other highly amplified satellites, forms prominent heterochromatic bands on faba bean chromosomes. The TE density was remarkably invariable along all six chromosomes, mirroring gene density and recombination rate, and inverse to the density of satellite arrays.

Repeated sequences were called with the Repeat Detector, which is part of the Ensembl Genomes repeat feature pipelines. Repeats length: 10864459657 - Repeats content: 91.2%

References

    • The giant diploid faba genome unlocks variation in a global protein crop.
      Jayakodi M, Golicz AA, Kreplak J, Fechete LI, Angra D, Bednář P, Bornhofen E, Zhang H, Boussageon R, Kaur S, Cheung K, Čížková J, Gundlach H, Hallab A, Imbert B, Keeble-Gagnère G, Koblížková A, Kobrlová L, Krejčí P, Mouritzen TW, Neumann P, Nadzieja M, Nielsen LK, Novák P, Orabi J, Padmarasu S, Robertson-Shersby-Harvie T, Robledillo LÁ, Schiemann A, Tanskanen J, Törönen P, Warsame AO, Wittenberg AHJ, Himmelbach A, Aubert G, Courty PE, Doležel J, Holm LU, Janss LL, Khazaei H, Macas J, Mascher M, Smýkal P, Snowdon RJ, Stein N, Stoddard FL, Stougaard J, Tayeh N, Torres AM, Usadel B, Schubert I, O'Sullivan DM, Schulman AH, Andersen SU.. Nature 615 (7953)

Picture credit: Broad bean pile by Bodhi Peace, CC BY-SA 4.0, via Wikimedia Commons.

Statistics

Summary

AssemblyHedin2_genome_v1, INSDC Assembly GCA_948472305.1,
Database version113.1
Golden Path Length11,914,498,746
Genebuild byFBGC
Genebuild methodImport
Data sourceFBGC

Gene counts

Coding genes34,221
Gene transcripts37,065