HierCC

HierCC assigns genome assemblies to hierarchical population clusters using core genome Multi-Locus Sequence Typing (core genome MLST) to support population assignment and genomic surveillance of bacterial pathogens.


Key Features:

  • Hierarchical clustering of core genome MLST: Groups genomes into multi-level clusters based on core genome Multi-Locus Sequence Types to represent population structure.
  • Scalable multi-level assignments: Employs a scalable clustering scheme that supports both incremental and static multi-level cluster assignments across large datasets.
  • Core genome MLST-based operation: Operates directly on core genome MLST data derived from genome assemblies.
  • HCCeval integration: Integrates with HCCeval to determine optimal thresholds for assigning genomes to cohesive clusters within the hierarchy.
  • Real-time and large-scale analysis: Enables real-time genomic analysis and is applicable to large-scale whole-genome sequencing databases.
  • Empirical application to bacterial pathogens: Has been applied to the analysis of over 400,000 genomes from Salmonella, Escherichia, Yersinia, and Clostridioides.

Scientific Applications:

  • Infectious disease surveillance: Supports real-time genomic surveillance of bacterial pathogens using hierarchical cgMLST clusters.
  • Pathogen genotyping: Provides cgMLST-based genotype assignments for public health and epidemiological analyses.
  • Population assignment and structure delineation: Delineates population structure and assigns genomes to hierarchical population groups in large WGS databases.
  • Comparative analyses of specific taxa: Facilitates comparative genomic analyses of taxa such as Salmonella, Escherichia, Yersinia, and Clostridioides.

Methodology:

Assigns genomes to hierarchical clusters using core genome MLST profiles via a scalable clustering scheme that supports incremental and static multi-level assignments and uses HCCeval to determine optimal clustering thresholds.

Topics

Details

License:
GPL-3.0
Tool Type:
command-line tool
Programming Languages:
Python
Added:
1/18/2021
Last Updated:
3/18/2021

Operations

Publications

Zhou Z, Charlesworth J, Achtman M. HierCC: A multi-level clustering scheme for population assignments based on core genome MLST. Unknown Journal. 2020. doi:10.1101/2020.11.25.397539.

Documentation

Links