Hapler

Hapler assembles haplotype regions from genetically diverse, population-sampled sequence data to reconstruct haplotypes in low-diversity, low-coverage datasets such as transcriptomes from natural populations.


Key Features:

  • Population-sampled haplotype assembly: Assembles haplotypes from sequence data derived from populations rather than single individuals, enabling inference of within-population diversity.
  • Error management and confidence enhancement: Reconstructs full consensus sequences while managing sequencing errors and ambiguities to increase assembly confidence and reduce erroneous calls.
  • Conflict graph approach: Compares each sequence against every other to build a conflict graph, groups sequences into conflict-free sets, and applies a minimum coloring strategy to resolve conflicts.
  • Chimeric point identification and minimization: Identifies and minimizes chimeric points in consensus sequences and reports lower chimera formation compared to majority vote and viral quasispecies estimation across varying error rates, read lengths, and population haplotype biases.
  • Support for ecoinformatics and transcriptome sequencing: Provides haplotype information and phasing-error identification to support transcriptome sequencing of natural populations as a cost-effective alternative to genome sequencing.

Scientific Applications:

  • Ecological and evolutionary biology: Enables analysis of haplotype diversity within natural populations to investigate evolutionary processes and population dynamics.
  • Ecoinformatics and population transcriptomics: Supports transcriptome-based population studies as a cost-effective approach for surveying genetic diversity in natural populations.
  • Population genetics and biodiversity studies: Facilitates studies of genetic variation, adaptation, and biodiversity by providing robust haplotype assemblies from low-coverage data.
  • Phasing-error detection and downstream assembly improvement: Identifies potential phasing errors and reduces chimeric artifacts to improve the utility of assemblies for downstream analyses.

Methodology:

Hapler builds a conflict graph by pairwise comparing sequences, groups sequences into conflict-free sets, applies a minimum coloring strategy to resolve conflicts, reconstructs consensus sequences while identifying and minimizing chimeric points, and was evaluated on simulated datasets against majority vote and viral quasispecies estimation across different error rates, read lengths, and haplotype biases.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux
Programming Languages:
Java
Added:
8/3/2017
Last Updated:
11/25/2024

Operations

Publications

O'Neil ST, Emrich SJ. Haplotype and minimum-chimerism consensus determination using short sequence data. BMC Genomics. 2012;13(Suppl 2):S4. doi:10.1186/1471-2164-13-s2-s4. PMID:22537299. PMCID:PMC3394418.

Documentation

Links