RepeatExplorer
RepeatExplorer identifies and characterizes repetitive DNA elements in next-generation sequencing (NGS) datasets to determine repeat composition and evolutionary dynamics in plant and animal genomes.
Key Features:
- Graph-Based Clustering: Employs graph-based similarity clustering of short NGS reads to partition data into clusters representing individual repeat families.
- De Novo Identification: Performs de novo identification of repetitive elements without relying on reference databases of known elements.
- Repeat Annotation and Quantification: Provides programs for annotation and quantification of identified repeats, including classification of repeat types.
- Phylogenetic and Comparative Analyses: Supports investigation of phylogenetic relationships among retroelements and comparative analyses across multiple species.
- Visualization (SeqGrapheR): Includes visual inspection using SeqGrapheR to explore cluster structure and sequence variability.
- Scalability and Low-Pass Data Support: Scales to analyze low-pass genome sequencing and several million sequence reads to detect high- and medium-copy repeats.
Scientific Applications:
- Genome Structure and Evolution Studies: Enables characterization of repetitive sequence content to study genome structure and evolutionary dynamics in plants and animals.
- Repeat Family Analysis: Allows assessment of cluster sizes with statistical analyses and visual inspection to distinguish repeat types and intra-family sequence variability.
- Novel Element Discovery: Facilitates discovery and characterization of novel repeat elements and assembly of consensus sequences to investigate repeat family divergence.
Methodology:
Performs similarity-based graph partitioning of genome sequence reads into clusters representing repeat families; assembles consensus sequences; applies classification and quantification programs; uses statistical analysis of cluster sizes and visual inspection with SeqGrapheR; operates on low-pass NGS data and without reliance on reference repeat databases.
Topics
Collections
Details
- License:
- GPL-3.0
- Maturity:
- Mature
- Cost:
- Free of charge
- Tool Type:
- command-line tool, web application
- Operating Systems:
- Linux
- Programming Languages:
- R, Perl, Python
- Added:
- 11/9/2015
- Last Updated:
- 11/25/2024
Operations
Data Inputs & Outputs
Genome annotation
Inputs
Outputs
Publications
Novák P, Neumann P, Macas J. Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data. BMC Bioinformatics. 2010;11(1). doi:10.1186/1471-2105-11-378. PMID:20633259. PMCID:PMC2912890.
Novák P, Neumann P, Pech J, Steinhaisl J, Macas J. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics. 2013;29(6):792-793. doi:10.1093/bioinformatics/btt054. PMID:23376349.
Documentation
Downloads
- Binarieshttps://bitbucket.org/petrnovak/repex_tarean
- Software packagehttps://toolshed.g2.bx.psu.edu/view/petr-novak/repeatexplorer2Galaxy toolshed package
- Source codehttps://bitbucket.org/petrnovak/repex_tarean