Centrifuger

Centrifuger performs taxonomic classification of metagenomic sequencing reads by comparing them against comprehensive microbial genome databases such as RefSeq to assign taxonomy.


Key Features:

  • Taxonomic classification: Performs classification of metagenomic sequencing reads by comparing them to microbial genome databases such as RefSeq.
  • Run-block compression (lossless): Uses run-block compression to transform Burrows-Wheeler transformed (BWT) genome sequences into a representation achieving sublinear space complexity.
  • FM-index compaction: Integrates strategies to compact the Ferragina-Manzini (FM) index, halving memory usage compared to other FM-index-based approaches.
  • Rapid rank queries: Facilitates rapid rank queries on the compressed index to support sequence matching and lookup.
  • Unconstrained match length: Supports unconstrained match length to improve precision of taxonomic assignments, particularly at lower taxonomic levels.
  • Reduced memory footprint: Reduces memory requirements for processing microbial genomic data without sacrificing classification accuracy.

Scientific Applications:

  • Metagenomic taxonomic profiling: Assigns taxonomy to reads in metagenomic studies using comparisons to microbial genome databases.
  • High-resolution classification: Provides improved classification accuracy at lower taxonomic levels where precision is critical.
  • Large-scale metagenomic analyses: Enables analyses of large microbial databases by reducing storage and memory requirements.
  • Sequence classification tasks: Performs sequence-level classification for microbial genomic data.

Methodology:

Compares sequencing reads to microbial genome databases (e.g., RefSeq); transforms Burrows-Wheeler transformed (BWT) genome sequences using lossless run-block compression; compacts the Ferragina-Manzini (FM) index to reduce memory; supports rapid rank queries and unconstrained match length for taxonomic assignment.

Topics

Details

License:
MIT
Cost:
Free of charge
Tool Type:
command-line tool
Programming Languages:
C++
Added:
6/18/2024
Last Updated:
11/24/2024

Operations

Publications

Song L, Langmead B. Centrifuger: lossless compression of microbial genomes for efficient and accurate metagenomic sequence classification. Genome Biology. 2024;25(1). doi:10.1186/s13059-024-03244-4. PMID:38664753. PMCID:PMC11046777.

PMID: 38664753
Funding: - National Institute of General Medical Sciences: 3P20GM130454-05WS, P20GM130454, R35GM139602 - National Human Genome Research Institute: R01HG011392