WhatsHap

WhatsHap phases genomic variants by assembling haplotypes from sequencing reads to assign heterozygous single nucleotide polymorphisms (SNPs) to the two copies of a diploid genome.


Key Features:

  • Read-based phasing: Uses sequencing reads for phasing instead of relying solely on statistical population-based methods.
  • Haplotype assembly: Assigns heterozygous SNPs to their respective haplotypes in diploid genomes.
  • Weighted Minimum Error Correction (wMEC) solver: Implements an optimal solver for the weighted minimum error correction problem.
  • Runtime scalability: Achieves runtime linear in the number of SNPs.
  • Fixed-parameter tractability (FPT): Uses coverage as the FPT parameter to control computational complexity.
  • Coverage handling: Efficiently handles datasets with coverages up to 20×.
  • Practical coverage guideline: Demonstrates that ~15× coverage is generally sufficient to reliably phase long reads even with elevated sequencing error rates.
  • Error-aware phasing: Accounts for sequencing error information and read length when constructing haplotypes.
  • Accuracy metrics: Produces haplotypes with favorable switch and flip error rates compared to state-of-the-art statistical phasers.

Scientific Applications:

  • Population genetics: Enables haplotype-resolved analyses required for population genetics studies.
  • Downstream haplotype analyses: Provides phased variant data for downstream analyses that depend on accurate haplotypes.
  • Long-read sequencing phasing: Applicable to phasing long-read sequencing data where longer reads and higher error rates are present.

Methodology:

Performs read-based phasing by optimally solving the weighted minimum error correction (wMEC) problem with an algorithm that is linear in the number of SNPs and fixed-parameter tractable using coverage as the parameter, while incorporating sequencing error information and read length.

Topics

Details

License:
MIT
Maturity:
Emerging
Cost:
Free of charge
Tool Type:
command-line tool
Operating Systems:
Linux, Mac
Programming Languages:
C++, Python
Added:
1/27/2017
Last Updated:
11/25/2024

Operations

Data Inputs & Outputs

Publications

Garg S, Martin M, Marschall T. Read-Based Phasing of Related Individuals. Unknown Journal. 2016. doi:10.1101/037101.

Patterson M, Marschall T, Pisanti N, van Iersel L, Stougie L, Klau GW, Schönhuth A. W<scp>hats</scp>H<scp>ap</scp>: Weighted Haplotype Assembly for Future-Generation Sequencing Reads. Journal of Computational Biology. 2015;22(6):498-509. doi:10.1089/cmb.2014.0157. PMID:25658651.

Documentation

Downloads

Links