QVZ
QVZ compresses per-base sequencing quality values using a lossy algorithm to reduce storage of FASTQ and SAM files while preserving genotyping fidelity.
Key Features:
- Lossy compression: Performs lossy compression of sequencing quality values associated with DNA sequencing data.
- File format targets: Operates on quality values embedded in FASTQ and SAM formats.
- Storage reduction: Achieves higher compression ratios than traditional lossless methods, reducing the storage occupied by quality values that typically account for about half of uncompressed sequencing file size.
- Rate–distortion performance: Delivers superior rate–distortion performance across multiple distortion metrics compared to previously proposed algorithms.
- Quasi-convex distortion minimization: Allows minimization of arbitrary quasi-convex distortion functions for customized fidelity criteria.
- Genotyping fidelity: Produces compressed quality values that yield genotyping results closer to those from original quality values at a given compression rate.
- Implementation: Implemented in C.
Scientific Applications:
- Genotyping: Improves fidelity of genotyping analyses when using compressed quality values compared with other compression algorithms.
- Large-scale data storage and transmission: Reduces storage and transmission requirements for large-scale sequencing datasets by compressing quality values.
- Custom downstream analysis: Enables tailoring of quality-value compression to specific downstream analysis requirements via arbitrary quasi-convex distortion functions.
Methodology:
Applies lossy compression with rate–distortion optimization and minimization of quasi-convex distortion functions; implemented in C.
Topics
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Linux, Windows, Mac
- Programming Languages:
- C
- Added:
- 8/3/2017
- Last Updated:
- 11/25/2024
Operations
Publications
Malysa G, Hernaez M, Ochoa I, Rao M, Ganesan K, Weissman T. QVZ: lossy compression of quality values. Bioinformatics. 2015;31(19):3122-3129. doi:10.1093/bioinformatics/btv330. PMID:26026138. PMCID:PMC5856090.