PROSPECT

PROSPECT predicts proteome-wide histidine phosphorylation substrates and sites from protein sequence to support analysis of cellular signaling pathways and metabolic regulation.


Key Features:

  • Hybrid ensemble architecture: Integrates outputs from two convolutional neural network (CNN)-based classifiers with a random forest classifier.
  • One-of-K Coding: Encodes categorical amino acid information as sparse binary vectors for model input.
  • Enhanced Grouped Amino Acids Content (EGAAC): Represents grouped amino acid composition to capture local sequence properties.
  • Composition of K-Spaced Amino Acid Group Pairs (CKSAAGP): Encodes k-spaced amino acid group pair composition to capture spatial arrangement within sequences.
  • Classifier-feature mapping: Uses each of the three feature sets as input to one of the classifiers—two CNNs and one random forest—to generate complementary predictions.
  • Sequence-based proteome-wide prediction: Predicts both histidine phosphorylation substrates and specific phosphorylation sites from primary sequence data.

Scientific Applications:

  • Cellular signaling pathways: Enables identification of histidine phosphorylation sites relevant to signal transduction studies.
  • Metabolic processes: Supports investigation of histidine phosphorylation roles in metabolic regulation.
  • Cross-organism analysis: Applicable to prokaryotic proteomes and provides potential insights into analogous mechanisms in mammalian cells.
  • Protein function and regulation: Contributes sequence-based evidence for studies of protein regulation mediated by histidine phosphorylation.

Methodology:

Applies One-of-K Coding, EGAAC, and CKSAAGP feature encodings as inputs to two CNN-based classifiers and one random forest classifier, and integrates their outputs via a hybrid ensemble.

Topics

Details

Added:
1/18/2021
Last Updated:
1/28/2021

Operations

Publications

Chen Z, Zhao P, Li F, Leier A, Marquez-Lago TT, Webb GI, Baggag A, Bensmail H, Song J. PROSPECT: A web server for predicting protein histidine phosphorylation sites. Journal of Bioinformatics and Computational Biology. 2020;18(04):2050018. doi:10.1142/s0219720020500183. PMID:32501138.

PMID: 32501138
Funding: - National Health and Medical Research Council of Australia: 1144652 and 1127948 - Young Scientists Fund of the National Natural Science Foundation of ChinaYoung Scientists Fund of the National Natural Science Foundation of China: 31701142 - Australian Research Council: DP120104460, LP110200333