REPRO

REPRO detects evolutionarily related sequence motifs within individual protein sequences, including highly diverged repeat units, to delineate homologous repeats for evolutionary, structural, and alignment analyses.


Key Features:

  • Sensitive repeat detection: Detects homologous repeat regions whose similarity ranges from identical to nearly undetectable by conventional alignment heuristics.
  • Smith–Waterman-based local alignment enumeration: Uses variations of the Smith–Waterman algorithm to enumerate high-scoring nonoverlapping fragment alignments.
  • Graph-based clustering: Groups alignments with shared N-terminal boundaries and iteratively subdivides alignments to define and extend repeat clusters.
  • Iterative multiple alignment and profile sliding: Performs iterative multiple alignment and profile "sliding" across the query to detect weakly conserved fragments missed initially.
  • Bootstrap-style iterative refinement: Applies a bootstrap-style iterative refinement that mimics expert manual repeat identification while scaling computationally.
  • Performance optimizations: Incorporates optimizations yielding ≥25× performance improvements without compromising sensitivity.
  • Proteome-scale scalability: Scales to full proteomes for automated detection of diverse repeat architectures.

Scientific Applications:

  • Protein age estimation: Accurate repeat delineation facilitates estimation of protein or repeat unit ages.
  • Repeat-based structural modeling: Enables repeat-based fold inference and structural modeling of repeat proteins.
  • Improved multiple sequence alignments: Improves reliability of multiple sequence alignments by addressing misalignment caused by repeats.
  • Evolutionary and genomic analyses: Supports analysis of duplication, recombination, and fusion events in genomes.

Methodology:

The pipeline proceeds in three phases: (i) a comprehensive local alignment search using variations of the Smith–Waterman algorithm to enumerate high-scoring nonoverlapping fragment alignments; (ii) a graph-based clustering procedure that groups alignments with shared N-terminal boundaries and iteratively subdivides alignments to define initial repeat sets and extend clusters; and (iii) iterative multiple alignment and profile "sliding" across the query to detect additional weakly conserved fragments.

Topics

Details

Tool Type:
web application
Operating Systems:
Linux, Windows, Mac
Added:
4/21/2017
Last Updated:
11/25/2024

Operations

Publications

George RA, Heringa J. The REPRO server: finding protein internal sequence repeats through the Web. Trends in Biochemical Sciences. 2000;25(10):515-517. doi:10.1016/s0968-0004(00)01643-1. PMID:11203383.

Documentation

Links