WhatsHap
WhatsHap phases genomic variants by assembling haplotypes from sequencing reads to assign heterozygous single nucleotide polymorphisms (SNPs) to the two copies of a diploid genome.
Key Features:
- Read-based phasing: Uses sequencing reads for phasing instead of relying solely on statistical population-based methods.
- Haplotype assembly: Assigns heterozygous SNPs to their respective haplotypes in diploid genomes.
- Weighted Minimum Error Correction (wMEC) solver: Implements an optimal solver for the weighted minimum error correction problem.
- Runtime scalability: Achieves runtime linear in the number of SNPs.
- Fixed-parameter tractability (FPT): Uses coverage as the FPT parameter to control computational complexity.
- Coverage handling: Efficiently handles datasets with coverages up to 20×.
- Practical coverage guideline: Demonstrates that ~15× coverage is generally sufficient to reliably phase long reads even with elevated sequencing error rates.
- Error-aware phasing: Accounts for sequencing error information and read length when constructing haplotypes.
- Accuracy metrics: Produces haplotypes with favorable switch and flip error rates compared to state-of-the-art statistical phasers.
Scientific Applications:
- Population genetics: Enables haplotype-resolved analyses required for population genetics studies.
- Downstream haplotype analyses: Provides phased variant data for downstream analyses that depend on accurate haplotypes.
- Long-read sequencing phasing: Applicable to phasing long-read sequencing data where longer reads and higher error rates are present.
Methodology:
Performs read-based phasing by optimally solving the weighted minimum error correction (wMEC) problem with an algorithm that is linear in the number of SNPs and fixed-parameter tractable using coverage as the parameter, while incorporating sequencing error information and read length.
Topics
Details
- License:
- MIT
- Maturity:
- Emerging
- Cost:
- Free of charge
- Tool Type:
- command-line tool
- Operating Systems:
- Linux, Mac
- Programming Languages:
- C++, Python
- Added:
- 1/27/2017
- Last Updated:
- 11/25/2024
Operations
Data Inputs & Outputs
Genotyping
Publications
Garg S, Martin M, Marschall T. Read-Based Phasing of Related Individuals. Unknown Journal. 2016. doi:10.1101/037101.
Patterson M, Marschall T, Pisanti N, van Iersel L, Stougie L, Klau GW, Schönhuth A. W<scp>hats</scp>H<scp>ap</scp>: Weighted Haplotype Assembly for Future-Generation Sequencing Reads. Journal of Computational Biology. 2015;22(6):498-509. doi:10.1089/cmb.2014.0157. PMID:25658651.
Documentation
Downloads
- Software packagehttps://pypi.python.org/pypi/whatshap