QVZ

QVZ compresses per-base sequencing quality values using a lossy algorithm to reduce storage of FASTQ and SAM files while preserving genotyping fidelity.


Key Features:

  • Lossy compression: Performs lossy compression of sequencing quality values associated with DNA sequencing data.
  • File format targets: Operates on quality values embedded in FASTQ and SAM formats.
  • Storage reduction: Achieves higher compression ratios than traditional lossless methods, reducing the storage occupied by quality values that typically account for about half of uncompressed sequencing file size.
  • Rate–distortion performance: Delivers superior rate–distortion performance across multiple distortion metrics compared to previously proposed algorithms.
  • Quasi-convex distortion minimization: Allows minimization of arbitrary quasi-convex distortion functions for customized fidelity criteria.
  • Genotyping fidelity: Produces compressed quality values that yield genotyping results closer to those from original quality values at a given compression rate.
  • Implementation: Implemented in C.

Scientific Applications:

  • Genotyping: Improves fidelity of genotyping analyses when using compressed quality values compared with other compression algorithms.
  • Large-scale data storage and transmission: Reduces storage and transmission requirements for large-scale sequencing datasets by compressing quality values.
  • Custom downstream analysis: Enables tailoring of quality-value compression to specific downstream analysis requirements via arbitrary quasi-convex distortion functions.

Methodology:

Applies lossy compression with rate–distortion optimization and minimization of quasi-convex distortion functions; implemented in C.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux, Windows, Mac
Programming Languages:
C
Added:
8/3/2017
Last Updated:
11/25/2024

Operations

Publications

Malysa G, Hernaez M, Ochoa I, Rao M, Ganesan K, Weissman T. QVZ: lossy compression of quality values. Bioinformatics. 2015;31(19):3122-3129. doi:10.1093/bioinformatics/btv330. PMID:26026138. PMCID:PMC5856090.

PMID: 26026138
PMCID: PMC5856090
Funding: - National Institutes of Health: U01 CA198943

Documentation

Links