Meryl

Meryl counts and manipulates k-mer sets to enable efficient k-mer set operations for k-mer–based genome assembly analysis and reference-free assembly validation.


Key Features:

  • K-mer counting: Counts k-mers (subsequences of length k) derived from DNA sequences.
  • K-mer set operations: Performs efficient k-mer set operations for comparing and manipulating k-mer collections.
  • Integration: Originated in the Celera Assembler and has been integrated into Canu.
  • Merqury support: Provides k-mer operations used by Merqury to compare assembly k-mers to those in high-accuracy unassembled reads.
  • Trio and haplotype analyses: Enables assessment of haplotype-specific metrics including accuracy, completeness, phase block continuity, and switch errors in trios.
  • K-mer spectrum outputs: Produces k-mer spectrum plots for inspection of assembly characteristics.
  • Performance: Demonstrated robustness and speed across organisms including human and plant genomes.

Scientific Applications:

  • De novo assembly evaluation: Evaluates de novo genome assemblies by comparing k-mer content without relying on a reference genome.
  • Base-level accuracy and completeness estimation: Estimates base-level accuracy and assembly completeness by comparing assembly k-mers against high-accuracy unassembled reads (via Merqury).
  • Haplotype evaluation in trios: Assesses haplotype-specific accuracy, completeness, phase block continuity, and switch errors in trio assemblies.
  • Long-read assembly validation: Applicable to validating long-read assemblies that may exceed existing reference genome quality.
  • Cross-species validation: Applied to genomes including human and plant for robustness and speed demonstrations.

Methodology:

Counts and manipulates k-mers and performs efficient k-mer set operations, compares k-mers between de novo assemblies and high-accuracy unassembled reads (used by Merqury), supports trio-based haplotype-specific comparisons, and outputs k-mer spectrum plots.

Topics

Details

License:
Freeware
Maturity:
Mature
Cost:
Free of charge
Tool Type:
command-line tool
Programming Languages:
C, C++, Perl, Python
Added:
7/26/2022
Last Updated:
11/24/2024

Operations

Publications

Rhie A, Walenz BP, Koren S, Phillippy AM. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biology. 2020;21(1). doi:10.1186/s13059-020-02134-9. PMID:32928274. PMCID:PMC7488777.

PMID: 32928274
PMCID: PMC7488777
Funding: - National Human Genome Research Institute: Intramural Research Program

Links