PaRSnIP
PaRSnIP is a software tool designed to predict protein solubility outcomes using a gradient-boosting machine algorithm and approximating the sequence and structural features of the protein of interest. Protein solubility is a crucial factor in both research and production efficiency, and in silicon, sequence-based predictors that can accurately estimate solubility outcomes are highly sought. The need for such tools arises as protein solubility can impact the efficiency of protein expression, purification, and characterization.
In a study, the researchers present PaRSnIP as a novel approach for predicting protein solubility outcomes. They tested the tool on an independent test set and found that PaRSnIP outperformed other state-of-the-art sequence-based methods in accuracy and Matthew's correlation coefficient. The overall accuracy of PaRSnIP was 74%, with Matthew's correlation coefficient of 0.48. Additionally, PaRSnIP provides importance scores for all features used in training. The tool also revealed that higher fractions of exposed residues associate positively with protein solubility, and tripeptide stretches with multiple histidines associate negatively with solubility.
The improved prediction accuracy of PaRSnIP can enable it to predict protein solubility with greater reliability and to screen for sequence variants with enhanced manufacturability.
Topic
Protein structure analysis;Protein properties;Protein sites, features and motifs
Detail
Operation: Protein solubility prediction
Software interface: Command-line user interface;Library
Language: R
License: MIT License
Cost: Free
Version name: -
Credit: The Intramural Research Program (National Institute of Allergy and Infectious Diseases, National Institutes of Health, USA).
Input: -
Output: -
Contact: Reda Rawi , reda.rawi@nih.gov, gwo-yu.chuang@nih.gov
Collection: -
Maturity: -
Publications
- PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine.
- Rawi R, et al. PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine. PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine. 2018; 34:1092-1098. doi: 10.1093/bioinformatics/btx662
- https://doi.org/10.1093/bioinformatics/btx662
- PMID: 29069295
- PMC: PMC6031027
Download and documentation
Home page: https://github.com/RedaRawi/PaRSnIP
< Back to DB search