FCCCSR_Glu
FCCCSR_Glu predicts glutarylation sites on lysine residues using the FCCCSR semi-supervised algorithm to identify reliable negative samples and improve peptide-level classification.
Key Features:
- Semi-Supervised Learning: Employs the FCCCSR (Finding Core Objects and Clustering based on Structure of Reverse nearest neighbor) approach to address noisy data and class imbalance by identifying reliable non-glutarylation lysine sites from unlabeled samples.
- Core Object Identification: Detects core objects within positive samples using reverse nearest neighbor information to distinguish glutarylated from non-glutarylated sites.
- Natural Neighbor Clustering: Clusters core objects based on natural neighbor structure to select reliable negative samples for model training.
- Multi-View Feature Extraction: Extracts and fuses peptide features including amino acid composition, BLOSUM62 matrix scores, amino acid factors, and the composition of k-spaced amino acid pairs.
- Advanced Classification: Uses XGBoost as the classifier with hyperparameter optimization via a differential evolution algorithm to enhance predictive performance.
- Performance: On an independent test set reported sensitivity 85.18%, specificity 98.36%, accuracy 94.31%, and Matthew's correlation coefficient (MCC) 0.8651.
Scientific Applications:
- Protein modification studies: Supports research on post-translational modifications by predicting glutarylation sites and their implications in cellular processes.
- Substrate identification: Aids identification of potential glutarylation substrates to explore biological roles of glutarylation.
Methodology:
Positive samples are used to find core objects via reverse nearest neighbor and unlabeled samples are clustered by natural neighbor structure to select reliable negative lysine sites; multi-view peptide features (amino acid composition, BLOSUM62, amino acid factors, composition of k-spaced amino acid pairs) are extracted and fused; combined positive and reliable negative samples are used to train XGBoost with optimization by differential evolution.
Topics
Details
- License:
- Not licensed
- Tool Type:
- command-line tool
- Operating Systems:
- Mac, Linux, Windows
- Programming Languages:
- Python
- Added:
- 11/3/2022
- Last Updated:
- 11/24/2024
Operations
Publications
Ning Q, Qi Z, Wang Y, Deng A, Chen C. FCCCSR_Glu: a semi-supervised learning model based on FCCCSR algorithm for prediction of glutarylation sites. Briefings in Bioinformatics. 2022;23(6). doi:10.1093/bib/bbac421. PMID:36168700.