Meryl
Meryl counts and manipulates k-mer sets to enable efficient k-mer set operations for k-mer–based genome assembly analysis and reference-free assembly validation.
Key Features:
- K-mer counting: Counts k-mers (subsequences of length k) derived from DNA sequences.
- K-mer set operations: Performs efficient k-mer set operations for comparing and manipulating k-mer collections.
- Integration: Originated in the Celera Assembler and has been integrated into Canu.
- Merqury support: Provides k-mer operations used by Merqury to compare assembly k-mers to those in high-accuracy unassembled reads.
- Trio and haplotype analyses: Enables assessment of haplotype-specific metrics including accuracy, completeness, phase block continuity, and switch errors in trios.
- K-mer spectrum outputs: Produces k-mer spectrum plots for inspection of assembly characteristics.
- Performance: Demonstrated robustness and speed across organisms including human and plant genomes.
Scientific Applications:
- De novo assembly evaluation: Evaluates de novo genome assemblies by comparing k-mer content without relying on a reference genome.
- Base-level accuracy and completeness estimation: Estimates base-level accuracy and assembly completeness by comparing assembly k-mers against high-accuracy unassembled reads (via Merqury).
- Haplotype evaluation in trios: Assesses haplotype-specific accuracy, completeness, phase block continuity, and switch errors in trio assemblies.
- Long-read assembly validation: Applicable to validating long-read assemblies that may exceed existing reference genome quality.
- Cross-species validation: Applied to genomes including human and plant for robustness and speed demonstrations.
Methodology:
Counts and manipulates k-mers and performs efficient k-mer set operations, compares k-mers between de novo assemblies and high-accuracy unassembled reads (used by Merqury), supports trio-based haplotype-specific comparisons, and outputs k-mer spectrum plots.
Topics
Details
- License:
- Freeware
- Maturity:
- Mature
- Cost:
- Free of charge
- Tool Type:
- command-line tool
- Programming Languages:
- C, C++, Perl, Python
- Added:
- 7/26/2022
- Last Updated:
- 11/24/2024
Operations
Publications
Rhie A, Walenz BP, Koren S, Phillippy AM. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biology. 2020;21(1). doi:10.1186/s13059-020-02134-9. PMID:32928274. PMCID:PMC7488777.
PMID: 32928274
PMCID: PMC7488777
Funding: - National Human Genome Research Institute: Intramural Research Program
Links
Repository
https://github.com/marbl/meryl