PROSPECT
PROSPECT predicts proteome-wide histidine phosphorylation substrates and sites from protein sequence to support analysis of cellular signaling pathways and metabolic regulation.
Key Features:
- Hybrid ensemble architecture: Integrates outputs from two convolutional neural network (CNN)-based classifiers with a random forest classifier.
- One-of-K Coding: Encodes categorical amino acid information as sparse binary vectors for model input.
- Enhanced Grouped Amino Acids Content (EGAAC): Represents grouped amino acid composition to capture local sequence properties.
- Composition of K-Spaced Amino Acid Group Pairs (CKSAAGP): Encodes k-spaced amino acid group pair composition to capture spatial arrangement within sequences.
- Classifier-feature mapping: Uses each of the three feature sets as input to one of the classifiers—two CNNs and one random forest—to generate complementary predictions.
- Sequence-based proteome-wide prediction: Predicts both histidine phosphorylation substrates and specific phosphorylation sites from primary sequence data.
Scientific Applications:
- Cellular signaling pathways: Enables identification of histidine phosphorylation sites relevant to signal transduction studies.
- Metabolic processes: Supports investigation of histidine phosphorylation roles in metabolic regulation.
- Cross-organism analysis: Applicable to prokaryotic proteomes and provides potential insights into analogous mechanisms in mammalian cells.
- Protein function and regulation: Contributes sequence-based evidence for studies of protein regulation mediated by histidine phosphorylation.
Methodology:
Applies One-of-K Coding, EGAAC, and CKSAAGP feature encodings as inputs to two CNN-based classifiers and one random forest classifier, and integrates their outputs via a hybrid ensemble.
Topics
Details
- Added:
- 1/18/2021
- Last Updated:
- 1/28/2021
Operations
Publications
Chen Z, Zhao P, Li F, Leier A, Marquez-Lago TT, Webb GI, Baggag A, Bensmail H, Song J. PROSPECT: A web server for predicting protein histidine phosphorylation sites. Journal of Bioinformatics and Computational Biology. 2020;18(04):2050018. doi:10.1142/s0219720020500183. PMID:32501138.
PMID: 32501138
Funding: - National Health and Medical Research Council of Australia: 1144652 and 1127948
- Young Scientists Fund of the National Natural Science Foundation of ChinaYoung Scientists Fund of the National Natural Science Foundation of China: 31701142
- Australian Research Council: DP120104460, LP110200333