ProteinProphet

ProteinProphet computes protein-level probabilities to validate protein identifications from tandem mass spectrometry (MS/MS) database search results using peptides assigned from proteolytic digests.


Key Features:

  • Probabilistic Framework: Calculates probabilities of protein presence by analyzing peptide assignments to MS/MS spectra.
  • Handling Peptide Ambiguity: Apportions peptides that map to multiple proteins in the sequence database among corresponding protein entries.
  • Expectation-Maximization Algorithm: Uses the expectation-maximization algorithm to derive a minimal list of proteins sufficient to explain observed peptide assignments.
  • Validation Across Complex Samples: Validated on spectra from 18 purified proteins and complex biological samples including H. influenzae and Halobacterium.
  • Discrimination Power: Produces probabilities that discriminate between correct and incorrect protein identifications to reduce false positives.
  • Predictable Sensitivity and Error Rates: Provides predictable sensitivity and controlled false positive identification error rates for filtering large proteomic datasets.
  • Standardization and Comparability: Generates consistent results that facilitate publication and comparison of large-scale protein identification datasets.

Scientific Applications:

  • Large-Scale Proteomic Studies: Quantifying and validating protein identifications in high-throughput proteomics datasets.
  • Complex Biological Samples: Analyzing proteomes of organisms and complex microbial communities such as H. influenzae and Halobacterium.
  • Data Standardization: Harmonizing reporting and comparison of protein identification results across experiments.

Methodology:

Computes protein presence probabilities from peptides assigned to MS/MS spectra obtained from proteolytic digests, apportions peptides shared among multiple proteins in the sequence database, and applies an expectation-maximization algorithm to derive a minimal protein list.

Topics

Collections

Details

Tool Type:
command-line tool
Operating Systems:
Linux, Mac, Windows
Programming Languages:
Perl, C
Added:
1/17/2017
Last Updated:
6/11/2025

Operations

Data Inputs & Outputs

Data filtering

Publications

Nesvizhskii AI, et al. A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem. 2003; 75:4646-58. doi: 10.1021/ac0341261

PMID: 14632076

Documentation

Downloads

Links

Software catalogue
http://ms-utils.org