HapCol

HapCol reconstructs haplotypes from sequencing reads to assemble diploid genomes and evaluate the impact of single-nucleotide polymorphisms (SNPs) on phenotypic traits.


Key Features:

  • Long Read Compatibility: Optimized for long, gapless reads from sequencing technologies including PacBio RS II (SMRT sequencing) and Oxford Nanopore MinION.
  • Error Correction Strategy: Leverages the uniform distribution of sequencing errors and employs an exact algorithm that is exponential in the maximum number of corrections per SNP position to minimize an overall error-correction score.
  • Computational Efficiency: Requires less memory and computing resources compared to existing combinatorial methods, enabling processing of higher-coverage datasets without relying on restrictive assumptions such as the all-heterozygous model.
  • Performance and Accuracy: Demonstrates competitive accuracy and increased numbers of phased positions, with improved metrics observed on real datasets.
  • Scalability: Overcomes limitations related to read length and sequencing coverage to handle larger datasets effectively.

Scientific Applications:

  • Genetic analysis of phenotypic traits: Enables reconstruction of haplotypes to study the effects of SNPs on phenotype.
  • Personalized medicine: Supports haplotype-resolved variant interpretation relevant to individual genotype-informed treatment strategies.
  • Evolutionary biology: Facilitates haplotype-based analyses for studying evolutionary relationships and allele histories.
  • Population genetics: Allows phasing of variants at population scale to investigate genetic diversity and structure.

Methodology:

Operates on long, gapless reads and uses an exact, exponential-time algorithm parameterized by the maximum corrections per SNP that minimizes an overall error-correction score while leveraging a uniform sequencing error model and not assuming an all-heterozygous genotype.

Topics

Details

License:
GPL-2.0
Maturity:
Emerging
Cost:
Free of charge
Tool Type:
command-line tool
Operating Systems:
Linux
Programming Languages:
C++
Added:
3/13/2016
Last Updated:
11/25/2024

Operations

Publications

Pirola Y, Zaccaria S, Dondi R, Klau GW, Pisanti N, Bonizzoni P. H<scp>ap</scp>C<scp>ol</scp>: accurate and memory-efficient haplotype assembly from long reads. Bioinformatics. 2015;32(11):1610-1617. doi:10.1093/bioinformatics/btv495. PMID:26315913.

Documentation