RepeatRunner

RepeatRunner integrates RepeatMasker nucleotide searches with BLASTX protein-based searches and PILER-DF de novo repeat finding to detect and classify repetitive elements in eukaryotic genomes, improving identification of highly divergent repeats and supporting higher-quality gene annotations.


Key Features:

  • Combined nucleotide and protein searches: Integrates RepeatMasker nucleotide searches with BLASTX protein-based searches to increase detection sensitivity for repeats.
  • De novo repeat discovery (PILER-DF): Uses PILER-DF for initial identification of interspersed repetitive elements without reliance on existing libraries.
  • Database-based classification: Classifies identified repeats by resemblance to known elements in Repbase and GenBank.
  • Gene screening to reduce false positives: Screens candidate repeats against annotated genes to minimize misclassification of genic sequences as repeats.
  • Detection of highly divergent repeats: Combines nucleotide and protein-level evidence to recover repeats that have diverged beyond nucleotide similarity to library entries.
  • Empirical validation in Dipteran genomes: Demonstrated improved repeat identification in thirteen Dipteran genomes.

Scientific Applications:

  • Genome repeat annotation: Comprehensive identification and classification of repetitive elements in eukaryotic genomes, including previously un-annotated sequences.
  • Improving gene annotation: Reduces repeat-derived annotation errors to support more accurate gene models.
  • Annotation-dependent analyses: Enhances downstream analyses such as microarray studies by providing more complete repeat annotation.
  • Functional genomics of repeats: Supports investigation of repetitive element roles in gene regulation, chromosome inheritance, nuclear architecture, and genome stability.
  • Comparative genomics: Enables comparative analysis of repeat content across species, exemplified by application to Dipteran genomes.

Methodology:

Initial de novo repeat finding with PILER-DF; classification based on resemblance to entries in Repbase and GenBank; screening of candidates against annotated genes; integration of RepeatMasker nucleotide searches with BLASTX protein-based searches.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux
Programming Languages:
Perl
Added:
8/3/2017
Last Updated:
11/25/2024

Operations

Publications

Smith CD, Edgar RC, Yandell MD, Smith DR, Celniker SE, Myers EW, Karpen GH. Improved repeat identification and masking in Dipterans. Gene. 2007;389(1):1-9. doi:10.1016/j.gene.2006.09.011. PMID:17137733. PMCID:PMC1945102.

Documentation

Links