RepARK

RepARK assembles repetitive motifs de novo from abundant k-mers in next-generation sequencing (NGS) whole-genome shotgun (WGS) reads to generate repeat libraries for identification and analysis of repetitive elements and transposable elements.


Key Features:

  • Reference-independent assembly: Assembles repeats directly from sequencing data without requiring a reference genome.
  • Efficiency and speed: Runs substantially faster than established methods for generating repeat libraries, enabling large-scale analyses.
  • Comprehensive repeat libraries: Produces libraries predominantly composed of repetitive motifs, offering more complete representation than traditional approaches.
  • High annotation accuracy: Generated repeat libraries are annotated using TEclass for transposable element classification.
  • Applicability to complex genomes: Applicable to model organisms such as Drosophila melanogaster and to larger genomes including the human genome.
  • Diagnostic utility: Identifies repetitive sequences that may represent contamination in NGS datasets.

Scientific Applications:

  • Genomic research: Facilitates analysis of repeat elements to study genome structure and function.
  • Transposable element studies: Provides libraries and annotations for investigating composition and distribution of transposable elements across species.
  • Quality control in NGS data: Assists in detecting and mitigating contamination from repetitive sequences in sequencing datasets.

Methodology:

Assembles repetitive motifs directly from abundant k-mers in WGS reads, annotates the resulting repeat libraries using TEclass, and validates performance on simulated and real Drosophila melanogaster NGS data.

Topics

Details

License:
Other
Cost:
Free of charge
Tool Type:
command-line tool
Operating Systems:
Mac, Windows
Programming Languages:
Perl
Added:
3/21/2022
Last Updated:
3/21/2022

Operations

Publications

Koch P, Platzer M, Downie BR. RepARK—de novo creation of repeat libraries from whole-genome NGS reads. Nucleic Acids Research. 2014;42(9):e80-e80. doi:10.1093/nar/gku210. PMID:24634442. PMCID:PMC4027187.