GPHMM

GPHMM models chromosomal aberrations and genotyping signal biases in whole genome SNP array data from tumor samples to detect copy number alterations and loss of heterozygosity (LOH).


Key Features:

  • Global parameter integration: Incorporates global parameters into a Hidden Markov Model to account for sample-wide effects on SNP array signals.
  • Hidden Markov Model framework: Uses an HMM to segment genomic states relevant to copy number and LOH.
  • Baseline shift modeling: Quantitatively models signal baseline shifts arising from aneuploidy.
  • Normal cell contamination modeling: Explicitly models contamination from normal cells (tumor purity effects) in genotyping signals.
  • GC content bias correction: Accounts for GC content bias that distorts SNP array signal intensities.
  • Expectation-Maximization estimation: Employs an Expectation-Maximization (EM) algorithm for parameter estimation.
  • Low-purity sensitivity: Capable of identifying chromosomal rearrangements in samples with tumor cell content as low as 10%.
  • SNP array compatibility: Operates on whole genome SNP array genotyping data for genome-wide analysis.
  • Quality-control outputs: Produces global parameter estimates useful for data quality control and outlier detection in cohorts.

Scientific Applications:

  • Chromosomal aberration detection: Detection and characterization of chromosomal rearrangements and copy number alterations from SNP arrays.
  • LOH analysis: Identification and mapping of loss of heterozygosity regions in tumor genomes.
  • Tumor purity and aneuploidy assessment: Estimation of tumor cell content and aneuploidy-related baseline shifts from genotyping signals.
  • Cohort data quality control: Quality-control assessment and outlier detection in SNP array cohort studies using global parameter estimates.

Methodology:

Integration of global parameters into a Hidden Markov Model and parameter estimation via an Expectation-Maximization (EM) algorithm to quantitatively model baseline shifts (aneuploidy), normal-cell contamination, and GC content bias and to identify chromosomal rearrangements from whole genome SNP array genotyping signals.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux, Windows, Mac
Programming Languages:
Java, MATLAB
Added:
12/18/2017
Last Updated:
11/25/2024

Operations

Publications

Li A, Liu Z, Lezon-Geyda K, Sarkar S, Lannin D, Schulz V, Krop I, Winer E, Harris L, Tuck D. GPHMM: an integrated hidden Markov model for identification of copy number alteration and loss of heterozygosity in complex tumor samples using whole genome SNP arrays. Nucleic Acids Research. 2011;39(12):4928-4941. doi:10.1093/nar/gkr014. PMID:21398628. PMCID:PMC3130254.

Documentation

Links