Boiler

Boiler compresses collections of RNA sequencing (RNA-seq) alignments into compact representations that preserve a genomic coverage vector and empirical alignment distributions to reduce storage and support downstream analyses.


Key Features:

  • Lossy compression: Retains only a genomic coverage vector and several empirical distributions while discarding most per-read information, producing a substantially reduced storage footprint.
  • Per-read data recovery: Allows reconstruction of critical per-read information with only slight negative effects on downstream analyses such as isoform assembly and quantification.
  • Fast querying: Enables rapid queries of compressed alignment data without requiring full decompression.

Scientific Applications:

  • Large-scale RNA-seq data management: Storage-efficient management of extensive RNA-seq alignment collections for projects with high sequencing throughput.
  • Transcriptome profiling: Preserves coverage information used in transcriptome profiling workflows.
  • Differential gene expression studies: Supports differential gene expression analyses by retaining alignment summaries while minimizing storage requirements.
  • Isoform discovery and quantification: Enables isoform discovery, assembly, and quantification workflows with minimal impact from compression.

Methodology:

Transforms raw RNA-seq alignment data into a compact representation by retaining a genomic coverage vector and empirical distributions that summarize alignments, discarding most per-read details, and storing a compressed format that permits querying without full decompression.

Topics

Details

Tool Type:
library
Operating Systems:
Linux, Windows, Mac
Programming Languages:
Python
Added:
5/10/2018
Last Updated:
12/11/2018

Operations

Publications

Pritt J, Langmead B. Boiler: lossy compression of RNA-seq alignments using coverage vectors. Nucleic Acids Research. 2016;44(16):e133-e133. doi:10.1093/nar/gkw540. PMID:27298258. PMCID:PMC5027496.

Documentation