CountClust

CountClust analyzes RNA sequencing (RNA-seq) gene expression count data using grade of membership models (GoM), including admixture/topic models such as Latent Dirichlet Allocation, to assign partial cluster memberships that deconvolve overlapping biological processes.


Key Features:

  • Grade of Membership Models: Employs GoM models that allow samples to have partial memberships across multiple clusters, generalizing traditional clustering.
  • Identification of Characteristic Genes: Identifies genes distinctively expressed within each cluster to facilitate biological interpretation of cluster-specific signals.
  • Visual Summaries: Produces visual summaries of cluster memberships to represent sample-level admixture and relationships among samples.

Scientific Applications:

  • Human Tissue Analysis: Applied to GTEx data across 53 human tissues to highlight similarities among biologically related tissues and identify tissue-specific genes.
  • Single-cell RNA-seq in Developmental Biology: Applied to single-cell RNA-seq from mouse preimplantation embryos to reveal discrete and continuous variation across development and to identify genes involved in germ cell formation, compaction, morula formation, and differentiation into inner cell mass and trophoblast at the blastocyst stage.

Methodology:

Implements grade of membership (admixture/topic) models such as Latent Dirichlet Allocation, identifies cluster-specific genes, and generates visual summaries of cluster memberships.

Topics

Collections

Details

License:
GPL-2.0
Tool Type:
command-line tool, library
Operating Systems:
Linux, Windows, Mac
Programming Languages:
R
Added:
1/17/2017
Last Updated:
12/10/2018

Operations

Publications

Dey KK, Joyce Hsiao C, Stephens M. Visualizing the Structure of RNA-seq Expression Data using Grade of Membership Models. Unknown Journal. 2016. doi:10.1101/051631.

Documentation

Downloads

Links