ProteinProphet
ProteinProphet computes protein-level probabilities to validate protein identifications from tandem mass spectrometry (MS/MS) database search results using peptides assigned from proteolytic digests.
Key Features:
- Probabilistic Framework: Calculates probabilities of protein presence by analyzing peptide assignments to MS/MS spectra.
- Handling Peptide Ambiguity: Apportions peptides that map to multiple proteins in the sequence database among corresponding protein entries.
- Expectation-Maximization Algorithm: Uses the expectation-maximization algorithm to derive a minimal list of proteins sufficient to explain observed peptide assignments.
- Validation Across Complex Samples: Validated on spectra from 18 purified proteins and complex biological samples including H. influenzae and Halobacterium.
- Discrimination Power: Produces probabilities that discriminate between correct and incorrect protein identifications to reduce false positives.
- Predictable Sensitivity and Error Rates: Provides predictable sensitivity and controlled false positive identification error rates for filtering large proteomic datasets.
- Standardization and Comparability: Generates consistent results that facilitate publication and comparison of large-scale protein identification datasets.
Scientific Applications:
- Large-Scale Proteomic Studies: Quantifying and validating protein identifications in high-throughput proteomics datasets.
- Complex Biological Samples: Analyzing proteomes of organisms and complex microbial communities such as H. influenzae and Halobacterium.
- Data Standardization: Harmonizing reporting and comparison of protein identification results across experiments.
Methodology:
Computes protein presence probabilities from peptides assigned to MS/MS spectra obtained from proteolytic digests, apportions peptides shared among multiple proteins in the sequence database, and applies an expectation-maximization algorithm to derive a minimal protein list.
Topics
Collections
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Linux, Mac, Windows
- Programming Languages:
- Perl, C
- Added:
- 1/17/2017
- Last Updated:
- 6/11/2025
Operations
Data Inputs & Outputs
Data filtering
Inputs
Outputs
Publications
Nesvizhskii AI, et al. A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem. 2003; 75:4646-58. doi: 10.1021/ac0341261
PMID: 14632076
Documentation
Downloads
Links
Software catalogue
http://ms-utils.org