KmerGenie

KmerGenie estimates the optimal k-mer length for de Bruijn graph-based genome assemblers from sequencing reads to improve assembly quality and accuracy.


Key Features:

  • k-mer abundance histograms: Computes approximate abundance histograms for multiple values of k using a fast sampling method.
  • Optimization heuristic: Applies a heuristic that predicts the number of distinct genomic k-mers for each candidate k and selects the k that maximizes this predicted count.
  • Performance and validation: Has been tested across diverse sequencing datasets with selected k values shown to yield improved genome assemblies.

Scientific Applications:

  • De novo genome assembly: Optimizes k for de Bruijn graph-based assemblers to improve the quality and accuracy of assembled genomes from sequencing reads.

Methodology:

Computes approximate k-mer abundance histograms by fast sampling across multiple k values and uses a heuristic to predict distinct genomic k-mers per k, selecting the k that maximizes the predicted count.

Topics

Details

Maturity:
Mature
Tool Type:
command-line tool
Operating Systems:
Linux, Mac
Programming Languages:
R, C++, Python
Added:
1/13/2017
Last Updated:
11/25/2024

Operations

Publications

Chikhi R, Medvedev P. Informed and automated <i>k</i>-mer size selection for genome assembly. Bioinformatics. 2013;30(1):31-37. doi:10.1093/bioinformatics/btt310. PMID:23732276.

Documentation