Boiler
Boiler compresses collections of RNA sequencing (RNA-seq) alignments into compact representations that preserve a genomic coverage vector and empirical alignment distributions to reduce storage and support downstream analyses.
Key Features:
- Lossy compression: Retains only a genomic coverage vector and several empirical distributions while discarding most per-read information, producing a substantially reduced storage footprint.
- Per-read data recovery: Allows reconstruction of critical per-read information with only slight negative effects on downstream analyses such as isoform assembly and quantification.
- Fast querying: Enables rapid queries of compressed alignment data without requiring full decompression.
Scientific Applications:
- Large-scale RNA-seq data management: Storage-efficient management of extensive RNA-seq alignment collections for projects with high sequencing throughput.
- Transcriptome profiling: Preserves coverage information used in transcriptome profiling workflows.
- Differential gene expression studies: Supports differential gene expression analyses by retaining alignment summaries while minimizing storage requirements.
- Isoform discovery and quantification: Enables isoform discovery, assembly, and quantification workflows with minimal impact from compression.
Methodology:
Transforms raw RNA-seq alignment data into a compact representation by retaining a genomic coverage vector and empirical distributions that summarize alignments, discarding most per-read details, and storing a compressed format that permits querying without full decompression.
Topics
Details
- Tool Type:
- library
- Operating Systems:
- Linux, Windows, Mac
- Programming Languages:
- Python
- Added:
- 5/10/2018
- Last Updated:
- 12/11/2018
Operations
Publications
Pritt J, Langmead B. Boiler: lossy compression of RNA-seq alignments using coverage vectors. Nucleic Acids Research. 2016;44(16):e133-e133. doi:10.1093/nar/gkw540. PMID:27298258. PMCID:PMC5027496.