Repeat feature annotation
Several software programs are run to annotate three types of repeats:
- Low-complexity regions (Dust [1])
- Tandem repeats (TRF [2])
- Complex repeats:
- RepeatMasker [3]
- Repeat Detector (Red) [4] and Ensembl/plant-scripts [5]
Annotating repeats with RepeatMasker requires a repeat library. Repeat libraries from the following sources are used and combined where possible:
- The MIPS Repeat Database (REdat).
- nrTEplants, a curated library with repeated sequences annotated at REdat, RepetDB, TREP and other collections [5].
Viewing and accessing repeat features
By default, repeat features are not displayed in the genome browser; display them by using the Configure this page option. You can view all repeats, or a subset of repeats based on type.
The repeat annotations can be programatically accessed using the Ensembl API. See the RepeatFeature and RepeatFeatureAdaptor documentation for further details.
References
- Morgulis A et al. (2006) A fast and symmetric DUST implementation to mask low-complexity DNA sequences. J Comput Biol. 13:1028-40
- Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27: 573-580
- Smit AFA, Hubler R, Green P (2015) RepeatMasker Open-4.0 http://www.repeatmasker.org
- Girgis HZ (2015) Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale. BMC Bioinformatics, 16:227
- Contreras-Moreira B, Filippi CV, Naamati G, GarcĂía Girón C, Allen JE, Flicek P (2021) Efficient masking of plant genomes by combining kmer counting and curated repeats Preprint from bioRxiv, DOI: 10.1101/2021.03.22.436504