REPRO
REPRO detects evolutionarily related sequence motifs within individual protein sequences, including highly diverged repeat units, to delineate homologous repeats for evolutionary, structural, and alignment analyses.
Key Features:
- Sensitive repeat detection: Detects homologous repeat regions whose similarity ranges from identical to nearly undetectable by conventional alignment heuristics.
- Smith–Waterman-based local alignment enumeration: Uses variations of the Smith–Waterman algorithm to enumerate high-scoring nonoverlapping fragment alignments.
- Graph-based clustering: Groups alignments with shared N-terminal boundaries and iteratively subdivides alignments to define and extend repeat clusters.
- Iterative multiple alignment and profile sliding: Performs iterative multiple alignment and profile "sliding" across the query to detect weakly conserved fragments missed initially.
- Bootstrap-style iterative refinement: Applies a bootstrap-style iterative refinement that mimics expert manual repeat identification while scaling computationally.
- Performance optimizations: Incorporates optimizations yielding ≥25× performance improvements without compromising sensitivity.
- Proteome-scale scalability: Scales to full proteomes for automated detection of diverse repeat architectures.
Scientific Applications:
- Protein age estimation: Accurate repeat delineation facilitates estimation of protein or repeat unit ages.
- Repeat-based structural modeling: Enables repeat-based fold inference and structural modeling of repeat proteins.
- Improved multiple sequence alignments: Improves reliability of multiple sequence alignments by addressing misalignment caused by repeats.
- Evolutionary and genomic analyses: Supports analysis of duplication, recombination, and fusion events in genomes.
Methodology:
The pipeline proceeds in three phases: (i) a comprehensive local alignment search using variations of the Smith–Waterman algorithm to enumerate high-scoring nonoverlapping fragment alignments; (ii) a graph-based clustering procedure that groups alignments with shared N-terminal boundaries and iteratively subdivides alignments to define initial repeat sets and extend clusters; and (iii) iterative multiple alignment and profile "sliding" across the query to detect additional weakly conserved fragments.
Topics
Details
- Tool Type:
- web application
- Operating Systems:
- Linux, Windows, Mac
- Added:
- 4/21/2017
- Last Updated:
- 11/25/2024
Operations
Publications
George RA, Heringa J. The REPRO server: finding protein internal sequence repeats through the Web. Trends in Biochemical Sciences. 2000;25(10):515-517. doi:10.1016/s0968-0004(00)01643-1. PMID:11203383.