RepeatRunner
RepeatRunner integrates RepeatMasker nucleotide searches with BLASTX protein-based searches and PILER-DF de novo repeat finding to detect and classify repetitive elements in eukaryotic genomes, improving identification of highly divergent repeats and supporting higher-quality gene annotations.
Key Features:
- Combined nucleotide and protein searches: Integrates RepeatMasker nucleotide searches with BLASTX protein-based searches to increase detection sensitivity for repeats.
- De novo repeat discovery (PILER-DF): Uses PILER-DF for initial identification of interspersed repetitive elements without reliance on existing libraries.
- Database-based classification: Classifies identified repeats by resemblance to known elements in Repbase and GenBank.
- Gene screening to reduce false positives: Screens candidate repeats against annotated genes to minimize misclassification of genic sequences as repeats.
- Detection of highly divergent repeats: Combines nucleotide and protein-level evidence to recover repeats that have diverged beyond nucleotide similarity to library entries.
- Empirical validation in Dipteran genomes: Demonstrated improved repeat identification in thirteen Dipteran genomes.
Scientific Applications:
- Genome repeat annotation: Comprehensive identification and classification of repetitive elements in eukaryotic genomes, including previously un-annotated sequences.
- Improving gene annotation: Reduces repeat-derived annotation errors to support more accurate gene models.
- Annotation-dependent analyses: Enhances downstream analyses such as microarray studies by providing more complete repeat annotation.
- Functional genomics of repeats: Supports investigation of repetitive element roles in gene regulation, chromosome inheritance, nuclear architecture, and genome stability.
- Comparative genomics: Enables comparative analysis of repeat content across species, exemplified by application to Dipteran genomes.
Methodology:
Initial de novo repeat finding with PILER-DF; classification based on resemblance to entries in Repbase and GenBank; screening of candidates against annotated genes; integration of RepeatMasker nucleotide searches with BLASTX protein-based searches.
Topics
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Linux
- Programming Languages:
- Perl
- Added:
- 8/3/2017
- Last Updated:
- 11/25/2024
Operations
Publications
Smith CD, Edgar RC, Yandell MD, Smith DR, Celniker SE, Myers EW, Karpen GH. Improved repeat identification and masking in Dipterans. Gene. 2007;389(1):1-9. doi:10.1016/j.gene.2006.09.011. PMID:17137733. PMCID:PMC1945102.