RepeatExplorer

RepeatExplorer identifies and characterizes repetitive DNA elements in next-generation sequencing (NGS) datasets to determine repeat composition and evolutionary dynamics in plant and animal genomes.


Key Features:

  • Graph-Based Clustering: Employs graph-based similarity clustering of short NGS reads to partition data into clusters representing individual repeat families.
  • De Novo Identification: Performs de novo identification of repetitive elements without relying on reference databases of known elements.
  • Repeat Annotation and Quantification: Provides programs for annotation and quantification of identified repeats, including classification of repeat types.
  • Phylogenetic and Comparative Analyses: Supports investigation of phylogenetic relationships among retroelements and comparative analyses across multiple species.
  • Visualization (SeqGrapheR): Includes visual inspection using SeqGrapheR to explore cluster structure and sequence variability.
  • Scalability and Low-Pass Data Support: Scales to analyze low-pass genome sequencing and several million sequence reads to detect high- and medium-copy repeats.

Scientific Applications:

  • Genome Structure and Evolution Studies: Enables characterization of repetitive sequence content to study genome structure and evolutionary dynamics in plants and animals.
  • Repeat Family Analysis: Allows assessment of cluster sizes with statistical analyses and visual inspection to distinguish repeat types and intra-family sequence variability.
  • Novel Element Discovery: Facilitates discovery and characterization of novel repeat elements and assembly of consensus sequences to investigate repeat family divergence.

Methodology:

Performs similarity-based graph partitioning of genome sequence reads into clusters representing repeat families; assembles consensus sequences; applies classification and quantification programs; uses statistical analysis of cluster sizes and visual inspection with SeqGrapheR; operates on low-pass NGS data and without reliance on reference repeat databases.

Topics

Collections

Details

License:
GPL-3.0
Maturity:
Mature
Cost:
Free of charge
Tool Type:
command-line tool, web application
Operating Systems:
Linux
Programming Languages:
R, Perl, Python
Added:
11/9/2015
Last Updated:
11/25/2024

Operations

Data Inputs & Outputs

Genome annotation

Inputs

    Publications

    Novák P, Neumann P, Macas J. Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data. BMC Bioinformatics. 2010;11(1). doi:10.1186/1471-2105-11-378. PMID:20633259. PMCID:PMC2912890.

    Novák P, Neumann P, Pech J, Steinhaisl J, Macas J. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics. 2013;29(6):792-793. doi:10.1093/bioinformatics/btt054. PMID:23376349.

    Documentation

    Terms of use', 'Citation instructions', 'General', 'User manual
    http://repeatexplorer.org

    Downloads

    Links