Pisum sativum Assembly and Gene Annotation
About Pisum sativum
Pea (Pisum sativum L., 2n = 14) is the second most important grain legume in the world after common bean and is an important green vegetable with 14.3 t of dry pea and 19.9 t of green pea produced in 2016. Pea belongs to the Leguminosae (or Fabaceae), which includes cool season grain legumes from the Galegoid clade, such as pea, lentil (Lens culinaris Medik.), chickpea (Cicer arietinum L.), faba bean (Vicia faba L.) and tropical grain legumes from the Milletoid clade, such as common bean (Phaseolus vulgaris L.), cowpea (Vigna unguiculata (L.) Walp.) and mungbean (Vigna radiata (L.) R. Wilczek). It provides significant ecosystem services: it is a valuable source of dietary proteins, mineral nutrients, complex starch and fibers with demonstrated health benefits and its symbiosis with N-fixing soil bacteria reduces the need for applied N fertilizers so mitigating greenhouse gas emissions. Pea was domesticated ~10,000 years ago by Neolithic farmers of the Fertile Crescent, along with cereals and other grain legumes8. The large reservoir of genetic diversity in Pisum has facilitated its spread throughout Asia, Europe, Africa, the Americas and Oceania where it has adapted to diverse environments and culinary practices.
Complementary approaches were combined to obtain the pea reference genome assembly. Whole-genome Illumina short-read sequences were assembled into contigs using SoapdeNovo, then combined into scaffolds using long-range PacBio RSII sequences and whole-genome profiling of a bacterial articial chromosome (BAC) library. Scaffolds were manually curated for inter and intrachromosomal chimeras using sequences obtained from single chromosomes isolated by flow-cytometr and ultra-high-density skim genotyping-by-sequencing genetic map. Curated scaffolds were then integrated into 24,623 super-scaffolds (L50 of 415 kilobases (kb)) using BioNano maps. The seven pseudomolecules representing the pea chromosomes were obtained by anchoring super-scaffolds onto high-density genetic maps. Pseudomolecules were named according to the reference pea genetic map25 and chromosome numbering.
Ab initio and homology-based methods were combined to annotate protein-coding sequences. In total, 44,756 complete and 29 truncated genes were predicted, with an average gene length, coding sequence length and exon number of 2,784 base pairs (bp), 1,016 bp and 6.33 exons, respectively. The vast majority of gene models were supported by complementary DNA/expressed sequence tag evidence.
- A reference genome for pea provides insight into legume genome evolution.
Jonathan Kreplak, Mohammed-Amin Madoui, Petr Cápal....Jaroslav Doležel, Patrick Wincker & Judith Burstin
- Nature Genetics. 51
Picture credit: Wikipedia