- Gene trees are constructed using the longest protein for every gene in Ensembl Genomes Homologues are deduced from these trees. Proteins are clustered based on Best-Reciprocal Hits and Blast Score Ratios, and each cluster of proteins is aligned using Muscle. Finally, TreeBeST is used to produce a gene tree from each multiple alignment, reconciling it with the species tree to call duplication events. More information
- Whole genome alignments are performed using multiple species. More information →
- Ancestral sequences are calculated from multi-species whole genome alignments. More information→
- Conservation scores and constrained elements are calculated from the whole genome multiple alignments. More information→
- Syntenies are calculated from the multiple alignments. More information→
Data can be accessed using the Compara Perl API, BioMart, or comparative genomics pages on the browser. Gene trees can be viewed from any 'Gene' page on the browser, and exported via the control panel.
The taxonomy tree for all species is well-defined enough to be used in the peptide analysis.
In addition to each domain-specific Compara database, a pan-taxonomic Compara is built for each release using representative genomes from all significant clades represented in Ensembl and EnsemblGenomes, to offer a broad view of homologous relationships from across the taxonomy. The species tree used is a combination of Ensembl's taxonomy based tree and a manually calculated tree for bacteria. The species used are listed on the EnsemblGenomes website and in this documentation: Gene Orthology/Paralogy prediction method: Species Used.