HapZipper

HapZipper compresses HapMap single-nucleotide polymorphism (SNP) datasets losslessly to reduce storage and transmission requirements for large-scale genomic data.


Key Features:

  • Lossless Compression: Ensures no genetic information is lost, preserving SNP genotypes for downstream genomic analyses.
  • Tailored Methodology: Leverages the specific file format and biological properties of HapMap data to achieve higher compression than generic compressors such as gzip, bzip2, and lzma.
  • Benchmark Performance: Demonstrated compression of HapMap 3 population datasets to less than 5% of their original size.

Scientific Applications:

  • Efficient Storage: Reduces disk footprint of HapMap and other SNP genotype datasets to facilitate large-scale data management.
  • Faster Data Transmission: Lowers transferred data volumes to accelerate network transfer of genomic datasets between collaborators.
  • Collaborative Data Sharing: Enables distribution of complete lossless SNP datasets in collaborative projects without information loss.
  • Reduced Computational Costs: Decreases I/O and storage-related costs associated with processing and archiving large HapMap SNP datasets.

Methodology:

The algorithm implements a lossless compression scheme specifically optimized for the structure and characteristics of HapMap SNP genotype data, exploiting patterns within SNP datasets that generic compressors (gzip, bzip2, lzma) do not target.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux, Windows
Programming Languages:
Java, C++
Added:
8/3/2017
Last Updated:
12/10/2018

Operations

Publications

Chanda P, et al. HapZipper: sharing HapMap populations just got easier. Nucleic Acids Res. 2012; 40:e159. doi: 10.1093/nar/gks709

PMID: 22844100

Documentation

Links