RepeatAnalyzer

RepeatAnalyzer analyzes short-sequence repeats (SSRs) and genotypes to track, catalog, and quantify SSR distribution and genetic diversity across prokaryotic and eukaryotic genomes, with validation on Anaplasma marginale.


Key Features:

  • Tracking and Management: Catalogs short-sequence repeats (SSRs) and genotypes for systematic documentation of repeat sequences.
  • Analysis Capabilities: Computes metrics assessing regional genetic diversity, SSR variety, and SSR regularity within loci.
  • Visualization Tools: Generates geographic maps illustrating the distribution of genotypes and SSRs across regions of interest.
  • Validation and Accuracy: Validated repeat identification and genotyping using 380 Anaplasma marginale isolates, confirming precision of repeat calls.
  • Error Detection: Detects discrepancies in published data, including misreported SSRs, duplicate names for different SSRs, and multiple names assigned to a single SSR.

Scientific Applications:

  • Genotype Identification: Uses heterogeneous SSR patterns within loci to assign strain genotypes for epidemiological and pathogen-evolution studies.
  • Genetic Diversity Analysis: Quantifies regional genetic diversity to inform population structure and evolutionary dynamics.
  • Data Correction and Validation: Identifies and helps correct errors in published SSR data to improve reliability of genomic datasets.

Methodology:

Employs novel metrics to evaluate SSR distribution, fits genotype-length distributions (reported as approximately normal) and SSR-frequency distributions (power-law-like), computes edit-distance distributions (identifying a common edit distance of five or six), and validates analyses using 380 Anaplasma marginale isolates; analyses report that over 90% of repeats are between 28 and 29 amino acids long.

Topics

Details

License:
GPL-3.0
Tool Type:
command-line tool
Operating Systems:
Linux, Windows, Mac
Programming Languages:
Python
Added:
10/1/2018
Last Updated:
12/10/2018

Operations

Publications

Catanese HN, Brayton KA, Gebremedhin AH. RepeatAnalyzer: a tool for analysing and managing short-sequence repeat data. BMC Genomics. 2016;17(1). doi:10.1186/s12864-016-2686-2. PMID:27260942. PMCID:PMC4891823.

PMID: 27260942
PMCID: PMC4891823
Funding: - Washington State University: start-up fund - National Science Foundation: IIS-1553528

Documentation