RepARK
RepARK assembles repetitive motifs de novo from abundant k-mers in next-generation sequencing (NGS) whole-genome shotgun (WGS) reads to generate repeat libraries for identification and analysis of repetitive elements and transposable elements.
Key Features:
- Reference-independent assembly: Assembles repeats directly from sequencing data without requiring a reference genome.
- Efficiency and speed: Runs substantially faster than established methods for generating repeat libraries, enabling large-scale analyses.
- Comprehensive repeat libraries: Produces libraries predominantly composed of repetitive motifs, offering more complete representation than traditional approaches.
- High annotation accuracy: Generated repeat libraries are annotated using TEclass for transposable element classification.
- Applicability to complex genomes: Applicable to model organisms such as Drosophila melanogaster and to larger genomes including the human genome.
- Diagnostic utility: Identifies repetitive sequences that may represent contamination in NGS datasets.
Scientific Applications:
- Genomic research: Facilitates analysis of repeat elements to study genome structure and function.
- Transposable element studies: Provides libraries and annotations for investigating composition and distribution of transposable elements across species.
- Quality control in NGS data: Assists in detecting and mitigating contamination from repetitive sequences in sequencing datasets.
Methodology:
Assembles repetitive motifs directly from abundant k-mers in WGS reads, annotates the resulting repeat libraries using TEclass, and validates performance on simulated and real Drosophila melanogaster NGS data.
Topics
Details
- License:
- Other
- Cost:
- Free of charge
- Tool Type:
- command-line tool
- Operating Systems:
- Mac, Windows
- Programming Languages:
- Perl
- Added:
- 3/21/2022
- Last Updated:
- 3/21/2022
Operations
Publications
Koch P, Platzer M, Downie BR. RepARK—de novo creation of repeat libraries from whole-genome NGS reads. Nucleic Acids Research. 2014;42(9):e80-e80. doi:10.1093/nar/gku210. PMID:24634442. PMCID:PMC4027187.