CLUMPAK

CLUMPAK postprocesses results from model-based clustering programs (e.g., STRUCTURE) to identify consensus clustering solutions and align inferred clusters across K for multilocus genotype analyses.


Key Features:

  • Automated postprocessing: Handles multiple independent runs for each fixed value of K produced by model-based clustering programs such as STRUCTURE.
  • Identification of similar runs: Identifies sets of highly similar replicate runs and distinct modes using a Markov clustering algorithm applied to similarity matrices computed by Clumpp.
  • Consensus solution generation: Generates consensus solutions for each identified mode in the solution space.
  • Alignment across K values: Aligns inferred clusters across varying values of K to facilitate comparison of clustering results obtained at different K.
  • Selection of optimal K: Implements methods for selecting an optimal number of clusters (K).
  • Comparison across programs, models, and data subsets: Compares solutions derived from different programs, models, or data subsets.

Scientific Applications:

  • Population genetics: Interpretation and comparison of population genetic structure from multilocus genotype data.
  • Molecular ecology: Investigation of population-level genetic structure in ecological and conservation studies.
  • Multilocus genotype analysis: Summarization and comparison of multilocus genotype clustering results across runs and K values.

Methodology:

Similarity matrices are computed by Clumpp, a Markov clustering algorithm is applied to identify distinct modes among replicate runs, consensus solutions are generated for each mode, and inferred clusters are aligned across K values.

Topics

Collections

Details

License:
Not licensed
Tool Type:
web application
Operating Systems:
Linux, Windows, Mac
Programming Languages:
Perl
Added:
8/20/2017
Last Updated:
11/25/2024

Operations

Publications

Kopelman NM, Mayzel J, Jakobsson M, Rosenberg NA, Mayrose I. <scp>Clumpak</scp>: a program for identifying clustering modes and packaging population structure inferences across <i>K</i>. Molecular Ecology Resources. 2015;15(5):1179-1191. doi:10.1111/1755-0998.12387. PMID:25684545. PMCID:PMC4534335.

PMID: 25684545
PMCID: PMC4534335
Funding: - Israel Science Foundation: 1265/12 - NIH: R01 HG005855

Documentation