G-SQZ

G-SQZ is a software tool for compressing high-throughput sequencing data without altering their relative order, making selective access faster and easier. It uses a Huffman coding-based representation scheme that has achieved from 65% to 81% compression on benchmark datasets. G-SQZ reduces infrastructure and informatics costs for managing and analyzing large sequencing data. It is available for free download and use for academic/non-profit purposes and a license can be requested for for-profit use.

Topic

Data management;Bioinformatics;Applied mathematics

Detail

  • Operation: Optimisation and refinement;Formatting

  • Software interface: Web user interface

  • Language: ;Huffman coding-based sequencing-reads specific representation scheme that compresses data without altering the relative order. It allows selective access without scanning and decoding form start;Web application;Python

  • License: -

  • Cost: Free for academic/non-profit purposes

  • Version name: -

  • Credit: -

  • Input: -

  • Output: -

  • Contact: wtembe@tgen.org

  • Collection: -

  • Maturity: -

Publications

  • G-SQZ: compact encoding of genomic sequence and quality data.
  • Tembe W, et al. G-SQZ: compact encoding of genomic sequence and quality data. G-SQZ: compact encoding of genomic sequence and quality data. 2010; 26:2192-4. doi: 10.1093/bioinformatics/btq346
  • https://doi.org/10.1093/bioinformatics/btq346
  • PMID: 20605925
  • PMC: -

Download and documentation

    Currently not available or not maintained.


< Back to DB search