Genome Annotation

The genomes provided by Ensembl Genomes contain annotation on genes and gene function that are obtained via import of external data or use of predictive algorithms. This document outlines the steps involved in adding annotation to a genome assembly:

  1. Import protein coding gene models. Ensembl Genomes does not usually carry out primary annotation of protein-coding gene models. Gene models are imported either from annotation in INSDC sequence archive records or from other public sources, in which case GFF is the preferred import format. In addition, Ensembl Genomes is involved in collaborations from which manual annotation is imported. In 2023, Ensembl Bacteria released gene models generated using Prokka.
  2. Annotate non-coding gene models
  3. Annotate repeat features
  4. Annotate protein features
  5. Add cross-references to external data sources