HULK

HULK (Histosketching Using Little K-mers) is a software tool to address the challenges posed by the exponential growth in microbiome data. It introduces a method for compactly representing microbiome sequencing data by creating similarity-preserving sketches of streaming k-mer spectra, known as histosketches. These histosketches serve as dimensionality reduction, enabling efficient representation, dissimilarity estimation, rapid microbiome catalogue searching, and classification of microbiome samples in near real-time.

The core functionality of HULK involves applying streaming histogram sketching to microbiome samples. This process compresses the k-mer spectra of a microbiome into a histosketch, significantly reducing the data's complexity while preserving its ability to be analyzed for similarity. Using pairwise Jaccard similarity estimation, HULK can cluster histosketches by sample type, facilitating rapid searches for microbiome similarity through a locality-sensitive hashing indexing scheme.

A key application demonstrated by HULK's developers involves using histosketches to train machine learning classifiers. For instance, a random forest classifier was trained using 108 novel microbiome samples from premature neonates. This classifier achieved high accuracy (97%) and precision (96%) in predicting antibiotic treatment among the neonates, and it could classify microbiome data streams in less than 3 seconds.

Topic

Microbial ecology;Metagenomics;Metagenomic sequencing

Detail

  • Operation: Standardisation and normalisation;Read binning;k-mer counting

  • Software interface: Library

  • Language: Python

  • License: Other

  • Cost: Free

  • Version name: -

  • Credit: The STFC Hartree Centre's Innovation Return on Research programme, the Department for Business, Energy & Industrial Strategy, Wellcome Trust Investigator Award, the BBSRC Norwich Research Park Bioscience Doctoral Training Grant, Institute Strategic Programme grant for Gut Health and Food Safety, BBSRC Institute Strategic Programme Gut Microbes and Health.

  • Input: -

  • Output: -

  • Contact: Will Rowe will.rowe@stfc.ac.uk

  • Collection: -

  • Maturity: Stable

Publications

  • Streaming histogram sketching for rapid microbiome analytics.
  • Rowe WP, et al. Streaming histogram sketching for rapid microbiome analytics. Streaming histogram sketching for rapid microbiome analytics. 2019; 7:40. doi: 10.1186/s40168-019-0653-2
  • https://doi.org/10.1186/s40168-019-0653-2
  • PMID: 30878035
  • PMC: PMC6420756

Download and documentation


< Back to DB search