ClusterMine

ClusterMine integrates annotated gene sets into clustering analysis to identify sample groups and prioritize gene sets that drive sample similarity in gene expression data.


Key Features:

  • Knowledge-Integrated Clustering: Partitions gene expression data into subdata based on predefined annotated gene sets to assess functional contributions to sample similarity.
  • Functional Interpretation: Highlights and prioritizes the gene sets that most significantly contribute to observed clusters, linking clusters to specific biological functions or pathways.
  • Comparison to Conventional Methods: Addresses limitations of hierarchical clustering (HC) and consensus clustering (CC) by incorporating gene-set-level signals rather than relying solely on holistic expression profiles.
  • Input Requirements: Operates on a list of gene sets and a gene expression data matrix with genes in rows and samples in columns.
  • Clustering Integration: Performs clustering on each gene-set-specific subdata and integrates per-gene-set clustering results into a final comprehensive clustering output.
  • Validation: Demonstrated improved performance and biologically relevant prioritized gene sets across nine real experimental datasets.

Scientific Applications:

  • Cell subpopulation identification: Detects cell subpopulations by grouping samples according to functional gene-set signals in expression data.
  • Disease subtype discovery: Identifies disease subtypes and links them to pathway- or function-level gene-set differences.
  • Biological interpretation of clusters: Facilitates assigning biological functions or pathways to clusters via prioritized gene sets.

Methodology:

Requires a list of gene sets and a gene expression matrix; partitions the expression matrix into subdata by gene set, performs clustering on each subdata subset, and integrates the per-gene-set clustering results into a final clustering output.

Topics

Details

Programming Languages:
R
Added:
1/18/2021
Last Updated:
11/24/2024

Operations

Publications

Li H, Xu Y, Zhu X, Liu Q, Omenn GS, Wang J. ClusterMine: A knowledge-integrated clustering approach based on expression profiles of gene sets. Journal of Bioinformatics and Computational Biology. 2020;18(03):2040009. doi:10.1142/s0219720020400090. PMID:32698720. PMCID:PMC8864677.