mbkmeans

"mbkmeans" is an R package designed for efficient and scalable clustering analysis in single-cell RNA-sequencing (scRNA-seq) datasets. It addresses the challenges of large datasets, which may contain thousands to millions of cells, by implementing the mini-batch k-means algorithm. Unlike traditional clustering algorithms, "mbkmeans" supports on-disk data representations like HDF5, enabling the analysis of datasets that do not fit entirely into memory. The package's performance is demonstrated using large datasets, including one with 1.3 million cells, and its computing efficiency is compared to the standard k-means implementation.

Topic

RNA-Seq;Transcriptomics;Cell biology

Detail

  • Operation: Expression profile clustering;Sequence clustering;Standardisation and normalisation

  • Software interface: Library

  • Language: R

  • License: The MIT License

  • Cost: Free

  • Version name: 1.18.0

  • Credit: The National Institutes of Health, the the NIH BRAIN Initiative, the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation, an ENS-CFM Data Science Chair, Programma per Giovani Ricercatori Rita Levi Montalcini granted by the Italian Ministry of Education, University, and Research.

  • Input: -

  • Output: -

  • Contact: Davide Risso risso.davide@gmail.com

  • Collection: -

  • Maturity: Stable

Publications

  • mbkmeans: fast clustering for single cell data using mini-batch k-means
  • Hicks SC, Liu R, Ni Y, Purdom E, Risso D. mbkmeans: Fast clustering for single cell data using mini-batch k-means. PLoS Comput Biol. 2021 Jan 26;17(1):e1008625. doi: 10.1371/journal.pcbi.1008625. PMID: 33497379; PMCID: PMC7864438.
  • https://doi.org/10.1371/journal.pcbi.1008625
  • PMID: 33497379
  • PMC: PMC7864438

Download and documentation


< Back to DB search