Ensembl Core - Schema documentation
This document gives a high-level description of the tables that make up the EnsEMBL core schema. Tables are grouped into logical groups, and the purpose of each table is explained. It is intended to allow people to familiarise themselves with the schema when encountering it for the first time, or when they need to use some tables that they've not used before.
This document refers to version 113 of the EnsEMBL core schema.
The overall diagram can be found here.
List of the tables:
Assembly Tables
The assembly table states, which parts of seq_regions are exactly equal. It enables to transform coordinates between seq_regions. Typically this contains how chromosomes are made of contigs, clones out of contigs, and chromosomes out of supercontigs. It allows you to artificially chunk chromosome sequence into smaller parts. The data in this table defines the "static golden path", i.e. the best effort draft full genome sequence as determined by the UCSC or NCBI (depending which assembly you are using). Each row represents a component, e.g. a contig, (comp_seq_region_id, FK from seq_region table) at least part of which is present in the golden path. The part of the component that is in the path is delimited by fields cmp_start and cmp_end (start < end), and the absolute position within the golden path chromosome (or other appropriate assembled structure) (asm_seq_region_id) is given by asm_start and asm_end.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
asm_seq_region_id | INT(10) | - | Assembly sequence region id. Primary key, internal identifier. Foreign key references to the seq_region table. | key: asm_seq_region_idx unique key: all_idx |
cmp_seq_region_id | INT(10) | - | Component sequence region id. Foreign key references to the seq_region table. | key: cmp_seq_region_idx unique key: all_idx |
asm_start | INT(10) | - | Start absolute position within the golden path chromosome. | key: asm_seq_region_idx unique key: all_idx |
asm_end | INT(10) | - | End absolute position within the golden path chromosome. | unique key: all_idx |
cmp_start | INT(10) | - | Component start position within the golden path chromosome. | unique key: all_idx |
cmp_end | INT(10) | - | Component start position within the golden path chromosome. | unique key: all_idx |
ori | TINYINT | - | Orientation: 1 - sense; -1 - antisense. | unique key: all_idx |
See also: |
List of species with populated data: Show species
|
Allows multiple sequence regions to point to the same sequence, analogous to a symbolic link in a filesystem pointing to the actual file. This mechanism has been implemented specifically to support haplotypes and PARs, but may be useful for other similar structures in the future.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
assembly_exception_id | INT(10) | - | Assembly exception sequence region id. Primary key, internal identifier. | primary key |
seq_region_id | INT(10) | - | Sequence region id. Foreign key references to the seq_region table. | key: sr_idx |
seq_region_start | INT(10) | - | Sequence start position. | key: sr_idx |
seq_region_end | INT(10) | - | Sequence end position. | |
exc_type | ENUM:
|
NULL | Exception type, e.g. PAR, HAP - haplotype. | |
exc_seq_region_id | INT(10) | - | Exception sequence region id. Foreign key references to the seq_region table. | key: ex_idx |
exc_seq_region_start | INT(10) | - | Exception sequence start position. | key: ex_idx |
exc_seq_region_end | INT(10) | - | Exception sequence end position. | |
ori | INT | - | Orientation: 1 - sense; -1 - antisense. |
See also: |
Stores information about the available co-ordinate systems for the species identified through the species_id field. Note that for each species, there must be one co-ordinate system that has the attribute "top_level" and one that has the attribute "sequence_level".
Column | Type | Default value | Description | Index |
---|---|---|---|---|
coord_system_id | INT(10) | - | Primary key, internal identifier. | primary key |
species_id | INT(10) | 1 | Indentifies the species for multi-species databases. | unique key: rank_idx unique key: name_idx key: species_idx |
name | VARCHAR(40) | - | Co-oridinate system name, e.g. 'chromosome', 'contig', 'scaffold' etc. | unique key: name_idx |
version | VARCHAR(255) | NULL | Assembly. | unique key: name_idx |
rank | INT | - | Co-oridinate system rank. | unique key: rank_idx |
attrib | SET:
|
NULL | Co-oridinate system attrib (e.g. "top_level", "sequence_level"). |
See also: |
List of species with populated data: Show species
|
Allows the storage of flat file locations used to store large quanitities of data currently unsuitable in a traditional database table.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
data_file_id | INT(10) | - | Auto-increment surrogate primary key | primary key |
coord_system_id | INT(10) | - | Coordinate system this file is linked to. Used to decipher the assembly version it was mapped to | unique key: df_unq_idx |
analysis_id | SMALLINT | - | Analysis this file is linked to | unique key: df_unq_idx key: df_analysis_idx |
name | VARCHAR(100) | - | Name of the file | unique key: df_unq_idx key: df_name_idx |
version_lock | TINYINT(1) | 0 | Indicates that this file is only compatible with the current Ensembl release version | |
absolute | TINYINT(1) | 0 | Flags that the URL given is fully resolved and should be used without question | |
url | TEXT | NULL | Optional path to the file (can be absolute or relative) | |
file_type | ENUM:
|
NULL | Type of file e.g. BAM, BIGBED, BIGWIG and VCF | unique key: df_unq_idx |
Contains DNA sequence. This table has a 1:1 relationship with the seq_region table.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
seq_region_id | INT(10) | - | Primary key, internal identifier. Foreign key references to the seq_region table. | primary key |
sequence | LONGTEXT | - | DNA sequence. |
See also: |
List of species with populated data: Show species
|
Contains genome and assembly related statistics These include but are not limited to: feature counts, sequence lengths
Column | Type | Default value | Description | Index |
---|---|---|---|---|
genome_statistics_id | INT(10) | - | Primary key, internal identifier. | primary key |
statistic | VARCHAR(128) | - | Name of the statistics | unique key: stats_uniq |
value | BIGINT(11) | '0' | Corresponding value of the statistics (count/length) | |
species_id | INT | 1 | Indentifies the species for multi-species databases. | unique key: stats_uniq |
attrib_type_id | INT(10) | NULL | To distinguish similar statistics for different cases | unique key: stats_uniq |
timestamp | DATETIME | NULL | Date the statistics was generated |
List of species with populated data: Show species
|
Describes bands that can be stained on the chromosome.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
karyotype_id | INT(10) | - | Primary key, internal identifier. | primary key |
seq_region_id | INT(10) | - | Foreign key references to the seq_region table. | key: region_band_idx |
seq_region_start | INT(10) | - | Sequence start position. | |
seq_region_end | INT(10) | - | Sequence end position. | |
band | VARCHAR(40) | NULL | Band. | key: region_band_idx |
stain | VARCHAR(40) | NULL | Stain. |
See also: |
List of species with populated data: Show species
|
Stores data about the data in the current schema. Taxonomy information, version information and the default value for the type column in the assembly table are stored here. Unlike other tables, data in the meta table is stored as key-value pairs. Also stores (via assembly.mapping keys) the relationships between co-ordinate systems in the assembly table. The species_id field of the meta table is used in multi-species databases and makes it possible to have species-specific meta key-value pairs. The species-specific meta key-value pairs needs to be repeated for each species_id. Entries in the meta table that are not specific to any one species, such as the schema_version key and any other schema-related information must have their species_id field set to NULL. The default species_id, and the only species_id value allowed in single-species databases, is 1.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
meta_id | INT | - | Primary key, internal identifier. | primary key |
species_id | INT | 1 | Indentifies the species for multi-species databases. | unique key: species_key_value_idx key: species_value_idx |
meta_key | VARCHAR(64) | - | Name of the meta entry, e.g. "schema_version". | unique key: species_key_value_idx |
meta_value | VARCHAR(255) | - | Corresponding value of the key, e.g. "61". | unique key: species_key_value_idx key: species_value_idx |
See also: |
List of species with populated data: Show species
|
Describes which co-ordinate systems the different feature tables use.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
table_name | VARCHAR(40) | - | Ensembl database table name. | unique key: cs_table_name_idx |
coord_system_id | INT(10) | - | Foreign key references to the coord_system table. | unique key: cs_table_name_idx |
max_length | INT | NULL | Longest sequence length. |
See also: |
List of species with populated data: Show species
|
Stores information about sequence regions. The primary key is used as a pointer into the dna table so that actual sequence can be obtained, and the coord_system_id allows sequence regions of multiple types to be stored. Clones, contigs and chromosomes are all now stored in the seq_region table. Contigs are stored with the co-ordinate system 'contig'. The relationship between contigs and clones is stored in the assembly table. The relationships between contigs and chromosomes, and between contigs and supercontigs, are stored in the assembly table.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
seq_region_id | INT(10) | - | Primary key, internal identifier. | primary key |
name | VARCHAR(255) | - | Sequence region name. | unique key: name_cs_idx |
coord_system_id | INT(10) | - | Foreign key references to the coord_system table. | unique key: name_cs_idx key: cs_idx |
length | INT(10) | - | Sequence length. |
See also: |
List of species with populated data: Show species
|
Allows "attributes" to be defined for certain seq_regions. Provides a way of storing extra information about particular seq_regions without adding extra columns to the seq_region table. e.g.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
seq_region_id | INT(10) | '0' | Foreign key references to the seq_region table. | key: seq_region_idx unique key: region_attribx |
attrib_type_id | SMALLINT(5) | '0' | Foreign key references to the attrib_type table. | key: type_val_idx unique key: region_attribx |
value | TEXT | - | Attribute value. | key: type_val_idx key: val_only_idx unique key: region_attribx |
See also: |
List of species with populated data: Show species
|
Allows for storing multiple names for sequence regions.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
seq_region_synonym_id | INT | - | Primary key, internal identifier. | primary key |
seq_region_id | INT(10) | - | Foreign key references to the seq_region table. | unique key: syn_idx key: seq_region_idx |
synonym | VARCHAR(250) | - | Alternative name for sequence region. | unique key: syn_idx |
external_db_id | INT | NULL | Foreign key references to the external_db table. |
See also: |
List of species with populated data: Show species
|
External References
Groups together xref associations under a single description. Used when more than one associated xref term must be used to describe a condition
Column | Type | Default value | Description | Index |
---|---|---|---|---|
associated_group_id | INT(10) | - | Associated group id. Primary key, internal identifier | primary key |
description | VARCHAR(128) | NULL | Optional description for this group |
See also: |
List of species with populated data: Show species
|
This table associates extra associated annotations with a given ontology xref evidence and source under a specific condition. For GO this allows qualifiers (with/from) or annotation extensions to be added to a given ontology annotation.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
associated_xref_id | INT(10) | - | Associated xref id. Primary key, internal identifier | primary key |
object_xref_id | INT(10) | '0' | Object xref id this associated xref is linked to. Foreign key linked to the object_xref table | key: associated_object_idx unique key: object_associated_source_type_idx |
xref_id | INT(10) | '0' | Xref which is the associated term. Foreign key linked to the xref table | key: associated_idx unique key: object_associated_source_type_idx |
source_xref_id | INT(10) | NULL | Xref which is source of this association. Foreign key linked to the xref table | key: associated_source_idx unique key: object_associated_source_type_idx |
condition_type | VARCHAR(128) | NULL | The type of condition this link occurs in e.g. evidence, from, residue or assigned_by | unique key: object_associated_source_type_idx |
associated_group_id | INT(10) | NULL | Foreign key to allow for associated_group | key: associated_group_idx unique key: object_associated_source_type_idx |
rank | INT(10) | '0' | The rank in which the association occurs within an associated_group |
See also: |
List of species with populated data: Show species
|
Stores data about the biotypes and mappings to Sequence Ontology.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
biotype_id | INT | - | Primary key, internal identifier. | primary key |
name | VARCHAR(64) | - | Ensembl biotype name. | unique key: name_type_idx |
object_type | ENUM:
|
'gene' | Ensembl object type: 'gene' or 'transcript'. | unique key: name_type_idx |
db_type | SET:
|
'core' | Type, e.g. 'cdna', 'core', 'coreexpressionatlas', 'coreexpressionest', 'coreexpressiongnf', 'funcgen', 'otherfeatures', 'rnaseq', 'variation', 'vega', 'presite', 'sangervega' | |
attrib_type_id | INT | NULL | Foreign key references to the attrib_type table. | |
description | TEXT | NULL | Description. | |
biotype_group | ENUM:
|
NULL | Group, e.g. 'coding', 'pseudogene', 'snoncoding', 'lnoncoding', 'mnoncoding', 'LRG', 'undefined', 'no_group' | |
so_acc | VARCHAR(64) | NULL | Sequence Ontology accession of the biotype. | |
so_term | VARCHAR(1023) | NULL | Sequence Ontology term of the biotype. |
See also: |
List of species with populated data: Show species
|
Describes dependent external references which can't be directly mapped to Ensembl entities. They are linked to primary external references instead.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
object_xref_id | INT(10) | - | Primary key, internal identifier. Foreign key references to the object_xref table. | primary key |
master_xref_id | INT(10) | - | Foreign key references to the xref table. | key: master_idx |
dependent_xref_id | INT(10) | - | Foreign key references to the xref table. | key: dependent |
See also: |
List of species with populated data: Show species
|
Stores data about the external databases in which the objects described in the xref table are stored.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
external_db_id | INT | - | Primary key, internal identifier. | primary key |
db_name | VARCHAR(100) | - | Database name. | unique key: db_name_db_release_idx |
db_release | VARCHAR(255) | NULL | Database release. | unique key: db_name_db_release_idx |
status | ENUM:
|
- | Status, e.g. 'KNOWNXREF','KNOWN','XREF','PRED','ORTH','PSEUDO'. | |
priority | INT | - | Determines which one of the xrefs will be used as the gene name. | |
db_display_name | VARCHAR(255) | NULL | Database display name. | |
type | ENUM:
|
- | Type, e.g. 'ARRAY', 'ALT_TRANS', 'ALT_GENE', 'MISC', 'LIT', 'PRIMARY_DB_SYNONYM', 'ENSEMBL'. | |
secondary_db_name | VARCHAR(255) | NULL | Secondary database name. | |
secondary_db_table | VARCHAR(255) | NULL | Secondary database table. | |
description | TEXT | NULL | Description. |
See also: |
List of species with populated data: Show species
|
Some xref objects can be referred to by more than one name. This table relates names to xref IDs.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
xref_id | INT(10) | - | Primary key, internal identifier. | primary key |
synonym | VARCHAR(100) | - | Synonym | primary key key: name_index |
See also: |
List of species with populated data: Show species
|
Describes how well a particular xref object matches the EnsEMBL object.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
object_xref_id | INT(10) | - | Primary key, internal identifier. Foreign key references to the object_xref table. | primary key |
xref_identity | INT(5) | NULL | Percentage identity. | |
ensembl_identity | INT(5) | NULL | Percentage identity. | |
xref_start | INT | NULL | Xref sequence start. | |
xref_end | INT | NULL | Xref sequence end. | |
ensembl_start | INT | NULL | Ensembl sequence start. | |
ensembl_end | INT | NULL | Ensembl sequence end. | |
cigar_line | TEXT | NULL | Used to encode gapped alignments. | |
score | DOUBLE | NULL | Match score. | |
evalue | DOUBLE | NULL | Match evalue. |
See also: |
List of species with populated data: Show species
|
Allows storage of links to the InterPro database. InterPro is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences. InterPro - The InterPro website
Column | Type | Default value | Description | Index |
---|---|---|---|---|
interpro_ac | VARCHAR(40) | - | InterPro protein accession number. | unique key: accession_idx |
id | VARCHAR(40) | - | InterPro protein id. | unique key: accession_idx key: id_idx |
List of species with populated data: Show species
|
Describes links between EnsEMBL objects and objects held in external databases. The EnsEMBL object can be one of several types; the type is held in the ensembl_object_type column. The ID of the particular EnsEMBL gene, translation or whatever is given in the ensembl_id column. The xref_id points to the entry in the xref table that holds data about the external object. Each EnsEMBL object can be associated with zero or more xrefs. An xref object can be associated with one or more EnsEMBL objects.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
object_xref_id | INT(10) | - | Primary key, internal identifier. | primary key |
ensembl_id | INT(10) | - | Foreign key references to the seq_region, transcript, gene, @translation tables depending on ensembl_object_type. | unique key: xref_idx key: ensembl_idx |
ensembl_object_type | ENUM:
|
NULL | Ensembl object type: 'RawContig', 'Transcript', 'Gene', 'Translation', ..., 'RNAProduct' | unique key: xref_idx key: ensembl_idx |
xref_id | INT(10) | - | Foreign key references to the xref table. | unique key: xref_idx |
linkage_annotation | VARCHAR(255) | NULL | Additional annotation on the linkage. | |
analysis_id | SMALLINT | NULL | Foreign key references to the analysis table. | unique key: xref_idx key: analysis_idx |
See also: |
List of species with populated data: Show species
|
This table associates Evidence Tags to the relationship between EnsEMBL objects and ontology accessions (primarily GO accessions). The relationship to GO that is stored in the database is actually derived through the relationship of EnsEMBL peptides to SwissProt peptides, i.e. the relationship is derived like this: ENSP -> SWISSPROT -> GO And the evidence tag describes the relationship between the SwissProt Peptide and the GO entry. In reality, however, we store this in the database like this: ENSP -> SWISSPROT ENSP -> GO and the evidence tag hangs off of the relationship between the ENSP and the GO identifier. Some ENSPs are associated with multiple closely related Swissprot entries which may both be associated with the same GO identifier but with different evidence tags. For this reason a single Ensembl - external db object relationship in the object_xref table can be associated with multiple evidence tags in the ontology_xref table.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
object_xref_id | INT(10) | '0' | Composite key. Foreign key references to the object_xref table. | key: object_idx unique key: object_source_type_idx |
source_xref_id | INT(10) | NULL | Composite key. Foreign key references to the xref table. | key: source_idx unique key: object_source_type_idx |
linkage_type | VARCHAR(3) | NULL | Composite key. Evidence tags | unique key: object_source_type_idx |
See also: |
List of species with populated data: Show species
|
Describes why a particular external entity was not mapped to an ensembl one.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
unmapped_object_id | INT(10) | - | Primary key, internal identifier. | primary key |
type | ENUM:
|
- | Object type: 'xref', 'cDNA', 'Marker'. | |
analysis_id | SMALLINT | - | Foreign key references to the analysis table. | key: anal_exdb_idx |
external_db_id | INT | NULL | Foreign key references to the external_db table. | unique key: unique_unmapped_obj_idx key: anal_exdb_idx key: ext_db_identifier_idx |
identifier | VARCHAR(255) | - | External database identifier. | unique key: unique_unmapped_obj_idx key: id_idx key: ext_db_identifier_idx |
unmapped_reason_id | INT(10) | - | Foreign key references to the unmapped_reason table. | unique key: unique_unmapped_obj_idx |
query_score | DOUBLE | NULL | Actual mapping query score. | |
target_score | DOUBLE | NULL | Target mapping query score. | |
ensembl_id | INT(10) | '0' | Foreign key references to the seq_region, transcript, gene, @translation tables depending on ensembl_object_type. | unique key: unique_unmapped_obj_idx |
ensembl_object_type | ENUM:
|
'RawContig' | Ensembl object type: 'RawContig', 'Transcript', 'Gene','Translation'. | unique key: unique_unmapped_obj_idx |
parent | VARCHAR(255) | NULL | Foreign key references to the dependent_xref table, in case the unmapped object is dependent on a primary external reference which wasn't mapped to an ensembl one. | unique key: unique_unmapped_obj_idx |
See also: |
List of species with populated data: Show species
|
Describes the reason why a mapping failed.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
unmapped_reason_id | INT(10) | - | Primary key, internal identifier. | primary key |
summary_description | VARCHAR(255) | NULL | Summarised description. | |
full_description | VARCHAR(255) | NULL | Full description. |
See also: |
List of species with populated data: Show species
|
Holds data about objects which are external to EnsEMBL, but need to be associated with EnsEMBL objects. Information about the database that the external object is stored in is held in the external_db table entry referred to by the external_db column.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
xref_id | INT(10) | - | Primary key, internal identifier. | primary key |
external_db_id | INT | - | Foreign key references to the external_db table. | unique key: id_index |
dbprimary_acc | VARCHAR(512) | - | Primary accession number. | unique key: id_index |
display_label | VARCHAR(512) | - | Display label for the EnsEMBL web site. | key: display_index |
version | VARCHAR(10) | NULL | Object version. | unique key: id_index |
description | TEXT | NULL | Object description. | |
info_type | ENUM:
|
'NONE' | 'PROJECTION', 'MISC', 'DEPENDENT','DIRECT', 'SEQUENCE_MATCH','INFERRED_PAIR', 'PROBE','UNMAPPED', 'COORDINATE_OVERLAP', 'CHECKSUM'. | unique key: id_index key: info_type_idx |
info_text | VARCHAR(255) | '' | Text | unique key: id_index |
See also: |
List of species with populated data: Show species
|
Features
Describes features representing a density, or precentage coverage etc. in a given region.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
density_feature_id | INT(10) | - | Primary key, internal identifier. | primary key |
density_type_id | INT(10) | - | Density type. Foreign key references to the density_type table. | key: seq_region_idx |
seq_region_id | INT(10) | - | Sequence region. Foreign key references to the seq_region table. | key: seq_region_idx key: seq_region_id_idx |
seq_region_start | INT(10) | - | Sequence start position. | key: seq_region_idx |
seq_region_end | INT(10) | - | Sequence end position. | |
density_value | FLOAT | - | Density value. |
See also: |
List of species with populated data: Show species
|
Describes type representing a density, or percentage coverage etc. in a given region.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
density_type_id | INT(10) | - | Primary key, internal identifier. | primary key |
analysis_id | SMALLINT | - | Foreign key references to the analysis table. | unique key: analysis_idx |
block_size | INT | - | Block size. | unique key: analysis_idx |
region_features | INT | - | The number of features per sequence region inside this density type. | unique key: analysis_idx |
value_type | ENUM:
|
- | Value type, e.g. 'sum', 'ratio'. |
See also: |
List of species with populated data: Show species
|
Represents a ditag object in the EnsEMBL database. Corresponds to original tag containing the full sequence. This can be a single piece of sequence like CAGE tags or a ditag with concatenated sequence from 5' and 3' end like GIS or GSC tags. This data is available as a DAS track in ContigView on the EnsEMBL web site.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
ditag_id | INT(10) | - | Primary key, internal identifier. | primary key |
name | VARCHAR(30) | - | Ditag name. | |
type | VARCHAR(30) | - | Ditag type. | |
tag_count | smallint(6) | 1 | Tag count. | |
sequence | TINYTEXT | - | Sequence. |
See also: |
Describes where ditags hit on the genome. Represents a mapped ditag object in the EnsEMBL database. These are the original tags separated into start ("L") and end ("R") parts if applicable, successfully aligned to the genome. Two DitagFeatures usually relate to one parent Ditag. Alternatively there are CAGE tags e.g. which only have a 5\'tag ("F").
Column | Type | Default value | Description | Index |
---|---|---|---|---|
ditag_feature_id | INT(10) | - | Primary key, internal identifier. | primary key |
ditag_id | INT(10) | '0' | Foreign key references to the ditag table. | key: ditag_idx |
ditag_pair_id | INT(10) | '0' | Ditag pair id. | key: ditag_pair_idx |
seq_region_id | INT(10) | '0' | Foreign key references to the seq_region table. | key: seq_region_idx |
seq_region_start | INT(10) | '0' | Sequence start position. | key: seq_region_idx |
seq_region_end | INT(10) | '0' | Sequence end position. | key: seq_region_idx |
seq_region_strand | TINYINT(1) | '0' | Sequence region strand: 1 - forward; -1 - reverse. | |
analysis_id | SMALLINT | '0' | Foreign key references to the analysis table. | |
hit_start | INT(10) | '0' | Alignment hit start position. | |
hit_end | INT(10) | '0' | Alignment hit end position. | |
hit_strand | TINYINT(1) | '0' | Alignment hit strand: 1 - forward; -1 - reverse. | |
cigar_line | TINYTEXT | - | Used to encode gapped alignments. | |
ditag_side | ENUM:
|
- | Ditag side: L - start, R - end, F - 5\'tag only |
See also: |
Provides the evidence which we have used to declare an intronic region
Column | Type | Default value | Description | Index |
---|---|---|---|---|
intron_supporting_evidence_id | INT(10) | - | Surrogate primary key | primary key |
analysis_id | SMALLINT | - | Foreign key references to the analysis table. | unique: key |
seq_region_id | INT(10) | - | Foreign key references to the seq_region table. | unique: key key: seq_region_idx |
seq_region_start | INT(10) | - | Sequence start position. | unique: key key: seq_region_idx |
seq_region_end | INT(10) | - | Sequence end position. | unique: key |
seq_region_strand | TINYINT(2) | - | Sequence region strand: 1 - forward; -1 - reverse. | unique: key |
hit_name | VARCHAR(100) | - | External entity name/identifier. | unique: key |
score | DECIMAL(10,3) | NULL | Score supporting the intron | |
score_type | ENUM:
|
'NONE' | The type of score e.g. NONE | |
is_splice_canonical | TINYINT(1) | 0 | Indicates if the splice junction can be considered canonical i.e. behaves according to accepted rules |
See also: |
List of species with populated data: Show species
|
Stores the names of different genetic or radiation hybrid maps, for which there is marker map information.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
map_id | INT(10) | - | Primary key, internal identifier. | primary key |
map_name | VARCHAR(30) | - | Map name. |
See also: |
List of species with populated data: Show species
|
Stores data about the marker itself. A marker in Ensembl consists of a pair of primer sequences, an expected product size and a set of associated identifiers known as synonyms.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
marker_id | INT(10) | - | Primary key, internal identifier. | primary key key: marker_idx |
display_marker_synonym_id | INT(10) | NULL | Marker synonym. | key: display_idx |
left_primer | VARCHAR(100) | - | Left primer sequence. | |
right_primer | VARCHAR(100) | - | Right primer sequence. | |
min_primer_dist | INT(10) | - | Minimum primer distance. | |
max_primer_dist | INT(10) | - | Maximum primer distance. | |
priority | INT | NULL | Priority. | key: marker_idx |
type | ENUM:
|
NULL | Type, e.g. 'est', 'microsatellite'. |
See also: |
List of species with populated data: Show species
|
Used to describe positions of markers on the assembly. Markers are placed on the genome electronically using an analysis program.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
marker_feature_id | INT(10) | - | Primary key, internal identifier. | primary key |
marker_id | INT(10) | - | Foreign key references to the marker table. | |
seq_region_id | INT(10) | - | Foreign key references to the seq_region table. | key: seq_region_idx |
seq_region_start | INT(10) | - | Sequence start position. | key: seq_region_idx |
seq_region_end | INT(10) | - | Sequence end position. | |
analysis_id | SMALLINT | - | Foreign key references to the analysis table. | key: analysis_idx |
map_weight | INT(10) | NULL | The number of times that this marker has been mapped to the genome, e.g. a marker with map weight 3 has been mapped to 3 locations in the genome. |
See also: |
List of species with populated data: Show species
|
Stores map locations (genetic, radiation hybrid and in situ hybridization) for markers obtained from experimental evidence.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
marker_id | INT(10) | - | Primary key, internal identifier. | primary key |
map_id | INT(10) | - | Foreign key references to the map table. | primary key key: map_idx |
chromosome_name | VARCHAR(15) | - | Chromosome name | key: map_idx |
marker_synonym_id | INT(10) | - | Foreign key references to the marker_synonym table. | |
position | VARCHAR(15) | - | Position of the map location. | key: map_idx |
lod_score | DOUBLE | NULL | LOD score for map location. |
See also: |
List of species with populated data: Show species
|
Stores alternative names for markers, as well as their sources.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
marker_synonym_id | INT(10) | - | Primary key, internal identifier. | primary key key: marker_synonym_idx |
marker_id | INT(10) | - | Foreign key references to the marker table. | key: marker_idx |
source | VARCHAR(20) | NULL | Marker source. | |
name | VARCHAR(50) | NULL | Alternative name for marker. | key: marker_synonym_idx |
See also: |
List of species with populated data: Show species
|
Stores arbitrary attributes about the features in the misc_feature table.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
misc_feature_id | INT(10) | '0' | Foreign key references to the misc_feature table. | key: misc_feature_idx unique key: misc_attribx |
attrib_type_id | SMALLINT(5) | '0' | Foreign key references to the attrib_type table. | key: type_val_idx unique key: misc_attribx |
value | TEXT | - | Attribute value. | key: type_val_idx key: val_only_idx unique key: misc_attribx |
See also: |
List of species with populated data: Show species
|
Allows for storage of arbitrary features.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
misc_feature_id | INT(10) | - | Primary key, internal identifier. | primary key |
seq_region_id | INT(10) | '0' | Foreign key references to the seq_region table. | key: seq_region_idx |
seq_region_start | INT(10) | '0' | Sequence start position. | key: seq_region_idx |
seq_region_end | INT(10) | '0' | Sequence end position. | |
seq_region_strand | TINYINT(4) | '0' | Sequence region strand: 1 - forward; -1 - reverse. |
See also: |
List of species with populated data: Show species
|
This table classifies features into distinct sets.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
misc_feature_id | INT(10) | '0' | Primary key, internal identifier. Foreign key references to the misc_feature table. | primary key key: reverse_idx |
misc_set_id | SMALLINT(5) | '0' | Primary key, internal identifier. Foreign key references to the misc_feature table. | primary key key: reverse_idx |
See also: |
List of species with populated data: Show species
|
Defines "sets" that the features held in the misc_feature table can be grouped into.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
misc_set_id | SMALLINT(5) | - | Primary key, internal identifier. | primary key |
code | VARCHAR(25) | '' | Set code, e.g. bac_map | unique key: code_idx |
name | VARCHAR(255) | '' | Code name, e.g. BAC map | |
description | TEXT | - | Code description, e.g. Full list of mapped BAC clones | |
max_length | INT | - | Longest feature, e.g. 500000 |
See also: |
List of species with populated data: Show species
|
Stores exons that are predicted by ab initio gene finder programs. Unlike EnsEMBL exons they are not supported by any evidence.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
prediction_exon_id | INT(10) | - | Primary key, internal identifier. | primary key |
prediction_transcript_id | INT(10) | - | Foreign key references to the prediction_transcript table. | key: transcript_idx |
exon_rank | SMALLINT | - | Exon rank | |
seq_region_id | INT(10) | - | Foreign key references to the seq_region table. | key: seq_region_idx |
seq_region_start | INT(10) | - | Sequence start position. | key: seq_region_idx |
seq_region_end | INT(10) | - | Sequence end position. | |
seq_region_strand | TINYINT | - | Sequence region strand: 1 - forward; -1 - reverse. | |
start_phase | TINYINT | - | Exon start phase. | |
score | DOUBLE | NULL | Prediction score. | |
p_value | DOUBLE | NULL | Prediction p-value. |
See also: |
List of species with populated data: Show species
|
Stores transcripts that are predicted by ab initio gene finder programs (e.g. genscan, SNAP). Unlike EnsEMBL transcripts they are not supported by any evidence.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
prediction_transcript_id | INT(10) | - | Primary key, internal identifier. | primary key |
seq_region_id | INT(10) | - | Foreign key references to the seq_region table. | key: seq_region_idx |
seq_region_start | INT(10) | - | Sequence start position. | key: seq_region_idx |
seq_region_end | INT(10) | - | Sequence end position. | |
seq_region_strand | TINYINT | - | Sequence region strand: 1 - forward; -1 - reverse. | |
analysis_id | SMALLINT | - | Foreign key references to the analysis table. | key: analysis_idx |
display_label | VARCHAR(255) | NULL | Display label for the EnsEMBL web site. |
See also: |
List of species with populated data: Show species
|
Stores consensus sequences obtained from analysing repeat features.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
repeat_consensus_id | INT(10) | - | Primary key, internal identifier. | primary key |
repeat_name | VARCHAR(255) | - | Repeat name. | key: name |
repeat_class | VARCHAR(100) | - | E.g. 'Satellite', 'tRNA', 'LTR'. | key: class |
repeat_type | VARCHAR(40) | - | E.g. 'Satellite repeats', 'Tandem repeats', 'Low complexity regions'. | key: type |
repeat_consensus | TEXT | NULL | Repeat consensus sequence. | key: consensus |
List of species with populated data: Show species
|
Describes sequence repeat regions.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
repeat_feature_id | INT(10) | - | Primary key, internal identifier. | primary key |
seq_region_id | INT(10) | - | Foreign key references to the seq_region table. | key: seq_region_idx |
seq_region_start | INT(10) | - | Sequence start position. | key: seq_region_idx |
seq_region_end | INT(10) | - | Sequence end position. | |
seq_region_strand | TINYINT(1) | '1' | Sequence region strand: 1 - forward; -1 - reverse. | |
repeat_start | INT(10) | - | Repeat sequence start. | |
repeat_end | INT(10) | - | Repeat sequence end | |
repeat_consensus_id | INT(10) | - | Foreign key references to the repeat_consensus table. | key: repeat_idx |
analysis_id | SMALLINT | - | Foreign key references to the analysis table. | key: analysis_idx |
score | DOUBLE | NULL | Analysis score. |
See also: |
List of species with populated data: Show species
|
Describes general genomic features that don't fit into any of the more specific feature tables.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
simple_feature_id | INT(10) | - | Primary key, internal identifier. | primary key |
seq_region_id | INT(10) | - | Foreign key references to the seq_region table. | key: seq_region_idx |
seq_region_start | INT(10) | - | Sequence start position. | key: seq_region_idx |
seq_region_end | INT(10) | - | Sequence end position. | |
seq_region_strand | TINYINT(1) | - | Sequence region strand: 1 - forward; -1 - reverse. | |
display_label | VARCHAR(255) | - | Display label for the EnsEMBL web site. | key: hit_idx |
analysis_id | SMALLINT | - | Foreign key references to the analysis table. | key: analysis_idx |
score | DOUBLE | NULL | Analysis score. |
See also: |
List of species with populated data: Show species
|
Links intronic evidence to a pair of exons used within a transcript and to resolve the m:m relationship between introns and transcripts
Column | Type | Default value | Description | Index |
---|---|---|---|---|
intron_supporting_evidence_id | INT(10) | - | Foreign key references to the intron_supporting_evidence table | primary key |
transcript_id | INT(10) | - | Foreign key references to the transcript table. | primary key key: transcript_idx |
previous_exon_id | INT(10) | - | Foreign key to exon indicating the left hand flanking exon of the intron (assume forward strand) | |
next_exon_id | INT(10) | - | Foreign key to exon indicating the right hand flanking exon of the intron (assume forward strand) |
See also: |
List of species with populated data: Show species
|
Fundamental Tables
Stores information about genes on haplotypes that may be orthologous.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
alt_allele_id | INT | NULL | Primary key, internal identifier. | primary key |
gene_id | INT | - | Foreign key references to the gene table. | key: gene_id,alt_allele_group_id key: gene_idx |
alt_allele_group_id | INT | - | A group ID to show which alleles are related | key: gene_id,alt_allele_group_id |
See also: |
List of species with populated data: Show species
|
Holds all the different attributes assigned to individual alleles.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
alt_allele_id | INT | NULL | Primary key, internal identifier. | key: aa_idx |
attrib | ENUM:
|
NULL | Enum of attributes assigned to alternative alleles | key: aa_idx |
List of species with populated data: Show species
|
A minimal table used for tracking unique alt_allele_group_id's. MySQL does not allow multiple autoincrement fields. Further information about a group could be added here at a later date.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
alt_allele_group_id | INT | NULL | Primary key and only column. | primary key |
List of species with populated data: Show species
|
Usually describes a program and some database that together are used to create a feature on a piece of sequence. Each feature is marked with an analysis_id. The most important column is logic_name, which is used by the webteam to render a feature correctly on contigview (or even retrieve the right feature). Logic_name is also used in the pipeline to identify the analysis which has to run in a given status of the pipeline. The module column tells the pipeline which Perl module does the whole analysis, typically a RunnableDB module.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
analysis_id | SMALLINT | - | Primary key, internal identifier. | primary key |
created | datetime | NULL | Date to distinguish newer and older versions off the same analysis. | |
logic_name | VARCHAR(128) | - | String to identify the analysis. Used mainly inside pipeline. | unique key: logic_name_idx |
db | VARCHAR(120) | NULL | Database name. | |
db_version | VARCHAR(40) | NULL | Database version. | |
db_file | VARCHAR(120) | NULL | File system location of the database. | |
program | VARCHAR(80) | NULL | The binary used to create a feature. | |
program_version | VARCHAR(40) | NULL | The binary version. | |
program_file | VARCHAR(80) | NULL | File system location of the binary. | |
parameters | TEXT | NULL | A parameter string which is processed by the perl module. | |
module | VARCHAR(80) | NULL | Perl module names (RunnableDBS usually) executing this analysis. | |
module_version | VARCHAR(40) | NULL | Perl module version. | |
gff_source | VARCHAR(40) | NULL | How to make a gff dump from features with this analysis. | |
gff_feature | VARCHAR(40) | NULL | How to make a gff dump from features with this analysis. |
See also: |
List of species with populated data: Show species
|
Allows the storage of a textual description of the analysis, as well as a "display label", primarily for the EnsEMBL web site.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
analysis_id | SMALLINT | - | Primary key, internal identifier. Foreign key references to the analysis table. | unique key: analysis_idx |
description | TEXT | NULL | Textual description of the analysis. | |
display_label | VARCHAR(255) | - | Display label for the EnsEMBL web site. | |
displayable | TINYINT(1) | 1 | Flag indicating if the analysis description is to be displayed on the EnsEMBL web site. | |
web_data | TEXT | NULL | Other data used by the EnsEMBL web site. |
See also: |
List of species with populated data: Show species
|
Provides codes, names and desctriptions of attribute types.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
attrib_type_id | SMALLINT(5) | - | Primary key, internal identifier. | primary key |
code | VARCHAR(20) | '' | Attribute code, e.g. 'GapExons'. | unique key: code_idx |
name | VARCHAR(255) | '' | Attribute name, e.g. 'gap exons'. | |
description | TEXT | NULL | Attribute description, e.g. 'number of gap exons'. |
See also: |
List of species with populated data: Show species
|
Stores DNA sequence alignments generated from Blast (or Blast-like) comparisons.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
dna_align_feature_id | INT(10) | - | Primary key, internal identifier. | primary key |
seq_region_id | INT(10) | - | Foreign key references to the seq_region table. | key: seq_region_idx key: seq_region_idx_2 |
seq_region_start | INT(10) | - | Sequence start position. | key: seq_region_idx key: seq_region_idx_2 |
seq_region_end | INT(10) | - | Sequence end position. | |
seq_region_strand | TINYINT(1) | - | Sequence region strand: 1 - forward; -1 - reverse. | |
hit_start | INT | - | Alignment hit start position. | |
hit_end | INT | - | Alignment hit end position. | |
hit_strand | TINYINT(1) | - | Alignment hit strand: 1 - forward; -1 - reverse. | |
hit_name | VARCHAR(40) | - | Alignment hit name. | key: hit_idx |
analysis_id | SMALLINT | - | Foreign key references to the analysis table. | key: seq_region_idx key: analysis_idx |
score | DOUBLE | NULL | Alignment score. | key: seq_region_idx |
evalue | DOUBLE | NULL | Alignment e-value. | |
perc_ident | FLOAT | NULL | Alignment percentage identity. | |
cigar_line | TEXT | NULL | Used to encode gapped alignments. | |
external_db_id | INT | NULL | Foreign key references to the external_db table. | key: external_db_idx |
hcoverage | DOUBLE | NULL | Hit coverage. | |
align_type | ENUM:
|
'ensembl' | Alignment string type used |
See also: |
List of species with populated data: Show species
|
Enables storage of attributes that relate to DNA sequence alignments.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
dna_align_feature_id | INT(10) | - | Foreign key references to the dna_align_feature table. | unique key: dna_align_feature_attribx key: dna_align_feature_idx |
attrib_type_id | SMALLINT(5) | - | Foreign key references to the attrib_type table. | unique key: dna_align_feature_attribx key: type_val_idx |
value | TEXT | - | Attribute value. | unique key: dna_align_feature_attribx key: type_val_idx key: val_only_idx |
See also: |
List of species with populated data: Show species
|
Stores data about exons. Associated with transcripts via exon_transcript. Allows access to contigs seq_regions. Note seq_region_start is always less that seq_region_end, i.e. when the exon is on the other strand the seq_region_start is specifying the 3prime end of the exon.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
exon_id | INT(10) | - | Primary key, internal identifier. | primary key |
seq_region_id | INT(10) | - | Foreign key references to the seq_region table. | key: seq_region_idx |
seq_region_start | INT(10) | - | Sequence start position. | key: seq_region_idx |
seq_region_end | INT(10) | - | Sequence end position. | |
seq_region_strand | TINYINT(2) | - | Sequence region strand: 1 - forward; -1 - reverse. | |
phase | TINYINT(2) | - | The place where the intron lands inside the codon - 0 between codons, 1 between the 1st and second base, 2 between the second and 3rd base. Exons therefore have a start phase anda end phase, but introns have just one phase. | |
end_phase | TINYINT(2) | - | Usually, end_phase = (phase + exon_length)%3 but end_phase could be -1 if the exon is half-coding and its 3 prime end is UTR. | |
is_current | TINYINT(1) | 1 | 1 - exon is current. Always set to 1 in ensembl dbs, but needed for otterlace dbs | |
is_constitutive | TINYINT(1) | 0 | 1 - exon is constitutive. | |
stable_id | VARCHAR(128) | NULL | Release-independent stable identifier. | key: stable_id_idx |
version | SMALLINT | NULL | Stable identifier version number. | key: stable_id_idx |
created_date | DATETIME | NULL | Date created. | |
modified_date | DATETIME | NULL | Date modified. |
See also: |
List of species with populated data: Show species
|
Relationship table linking exons with transcripts. The rank column indicates the 5' to 3' position of the exon within the transcript, i.e. a rank of 1 means the exon is the 5' most within this transcript.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
exon_id | INT(10) | - | Composite key. Foreign key references to the exon table. | primary key key: exon |
transcript_id | INT(10) | - | Composite key. Foreign key references to the transcript table. | primary key key: transcript |
rank | INT(10) | - | Composite key. | primary key |
See also: |
List of species with populated data: Show species
|
Allows transcripts to be related to genes.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
gene_id | INT(10) | - | Primary key, internal identifier. | primary key |
biotype | VARCHAR(40) | - | Biotype, e.g. protein_coding. | |
analysis_id | SMALLINT | - | Foreign key references to the analysis table. | key: analysis_idx |
seq_region_id | INT(10) | - | Foreign key references to the seq_region table. | key: seq_region_idx |
seq_region_start | INT(10) | - | Sequence start position. | key: seq_region_idx |
seq_region_end | INT(10) | - | Sequence end position. | |
seq_region_strand | TINYINT(2) | - | Sequence region strand: 1 - forward; -1 - reverse. | |
display_xref_id | INT(10) | NULL | External reference for EnsEMBL web site. Foreign key references to the xref table. | key: xref_id_index |
source | VARCHAR(40) | - | e.g ensembl, havana etc. | |
description | TEXT | NULL | Gene description | |
is_current | TINYINT(1) | 1 | 1 - gene is current. Always set to 1 in ensembl dbs, but needed for otterlace dbs | |
canonical_transcript_id | INT(10) | - | Foreign key references to the transcript table. | key: canonical_transcript_id_idx |
stable_id | VARCHAR(128) | NULL | Release-independent stable identifier. | key: stable_id_idx |
version | SMALLINT | NULL | Stable identifier version number. | key: stable_id_idx |
created_date | DATETIME | NULL | Date created. | |
modified_date | DATETIME | NULL | Date modified. |
See also: |
List of species with populated data: Show species
|
Enables storage of attributes that relate to genes.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
gene_id | INT(10) | '0' | Foreign key references to the gene table. | key: gene_idx unique key: gene_attribx |
attrib_type_id | SMALLINT(5) | '0' | Foreign key references to the attrib_type table. | key: type_val_idx unique key: gene_attribx |
value | TEXT | - | Attribute value. | key: type_val_idx key: val_only_idx unique key: gene_attribx |
See also: |
List of species with populated data: Show species
|
Stores translation alignments generated from Blast (or Blast-like) comparisons.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
protein_align_feature_id | INT(10) | - | Primary key, internal identifier. | primary key |
seq_region_id | INT(10) | - | Foreign key references to the seq_region table. | key: seq_region_idx key: seq_region_idx_2 |
seq_region_start | INT(10) | - | Sequence start position. | key: seq_region_idx key: seq_region_idx_2 |
seq_region_end | INT(10) | - | Sequence end position. | |
seq_region_strand | TINYINT(1) | '1' | Sequence region strand: 1 - forward; -1 - reverse. | |
hit_start | INT(10) | - | Alignment hit start position. | |
hit_end | INT(10) | - | Alignment hit end position. | |
hit_name | VARCHAR(40) | - | Alignment hit name. | key: hit_idx |
analysis_id | SMALLINT | - | Foreign key references to the analysis table. | key: seq_region_idx key: analysis_idx |
score | DOUBLE | NULL | Alignment score. | key: seq_region_idx |
evalue | DOUBLE | NULL | Alignment e-value. | |
perc_ident | FLOAT | NULL | Alignment percentage identity. | |
cigar_line | TEXT | NULL | Used to encode gapped alignments. | |
external_db_id | INT | NULL | Foreign key references to the external_db table. | key: external_db_idx |
hcoverage | DOUBLE | NULL | Alignment hit coverage. | |
align_type | ENUM:
|
'ensembl' | Alignment string type used |
See also: |