Multiple genome alignments

Multiple alignments are calculated between groups of genomes.

Alignments available

NameGenomesMethod used
8 riceOryza barthii, Oryza glaberrima, Oryza glumipatula, Oryza meridionalis, Oryza nivara, Oryza rufipogon, Oryza sativa Indica Group, Oryza sativa Japonica GroupEPO
11 riceOryza barthii, Oryza brachyantha, Oryza glaberrima, Oryza glumipatula, Oryza longistaminata, Oryza meridionalis, Oryza nivara, Oryza punctata, Oryza rufipogon, Oryza sativa Indica Group, Oryza sativa Japonica GroupEPO-Extended
26 rice cultivarsLeersia perrieri, Oryza barthii, Oryza brachyantha, Oryza glaberrima, Oryza glumipatula, Oryza longistaminata, Oryza meridionalis, Oryza nivara, Oryza punctata, Oryza rufipogon, Oryza sativa (Geng/Japonica-sbtrp var. Chao Meo), Oryza sativa (Geng/Japonica-trop1 var. Azucena), Oryza sativa (Geng/Japonica-trop2 var. Ketan Nangka), Oryza sativa (Xian/Indica-1A var. Zhenshan 97), Oryza sativa (Xian/Indica-1B1 var. IR64), Oryza sativa (Xian/Indica-1B2 var. PR106), Oryza sativa (Xian/Indica-2A var. Gobol Sail), Oryza sativa (Xian/Indica-2B var. Larha Mugad), Oryza sativa (Xian/Indica-3A var. Lima), Oryza sativa (Xian/Indica-3B1 var. Khao Yai Guang), Oryza sativa (Xian/Indica-3B2 var. Liu Xu), Oryza sativa (Xian/Indica-adm var. Minghui 63), Oryza sativa (circum-Aus1 var. N22), Oryza sativa (circum-Aus2 var. Natel Boro), Oryza sativa (circum-Basmati var. ARC 10497), Oryza sativa Japonica GroupCactus
16 wheatAegilops tauschii, Brachypodium distachyon, Hordeum vulgare, Secale cereale, Triticum aestivum, Triticum aestivum Arinalrfor, Triticum aestivum Jagger, Triticum aestivum Julius, Triticum aestivum Lancer, Triticum aestivum Landmark, Triticum aestivum Mace, Triticum aestivum Norin61, Triticum aestivum Stanley, Triticum aestivum Sy Mattis, Triticum dicoccoides, Triticum urartuCactus

Alignment methods

PECAN Multiple Alignment

Pecan is used to provide global multiple genomic alignments. First, Mercator is used to build a synteny map between the genomes and then Pecan builds alignments in these syntenic regions.

Pecan is a global multiple sequence alignment program that makes practical the probabilistic consistency methodology for significant numbers of sequences of practically arbitrary length. As input it takes a set of sequences and a phylogenetic tree. The parameters and heuristics it employs are highly user configurable, it is written entirely in Java and also requires the installation of Exonerate.

EPO Multiple Alignment

The EPO (Enredo, Pecan, Ortheus) pipeline is a three step pipeline for whole-genome multiple alignments.

  1. Enredo produces colinear segments from extant genomes handling both rearrangements, deletions and duplications.
  2. Pecan, as described above, is used to align these segments.
  3. Finally, Ortheus is used to create genome-wide ancestral sequence reconstructions.

The pipeline requires alignments of so-called anchor sequences, which are explained here. Further details on all these methods can be found at: Enredo and Pecan: Genome-wide mammalian consistency-based multiple alignment with paralogs

EPO-Extended Multiple Alignment

Due to difficulties with running Ortheus on the fragmented assemblies, we have two flavours of the pipeline.

  1. The plain EPO pipeline is available on the chromosome-level genomes, listed as EPO in the table above
  2. The scaffold-level genomes are then projected onto the EPO alignments using LastZ-net alignments, listed as EPO-Extended.

By construction, each pair of EPO and EPO-Extended alignments represent the exact same alignment of chromosome-level genomes.

Progressive Cactus

Progressive-Cactus is a next-generation aligner that stores whole-genome alignments in a graph structure. Genomes can be added incrementally, which makes it scalable to hundreds of genomes. Further details on these methods can be found in Algorithms for genome multiple sequence alignment and Cactus graphs for genome comparisons.