PaRSnIP

PaRSnIP is a software tool designed to predict protein solubility outcomes using a gradient-boosting machine algorithm and approximating the sequence and structural features of the protein of interest. Protein solubility is a crucial factor in both research and production efficiency, and in silicon, sequence-based predictors that can accurately estimate solubility outcomes are highly sought. The need for such tools arises as protein solubility can impact the efficiency of protein expression, purification, and characterization.

In a study, the researchers present PaRSnIP as a novel approach for predicting protein solubility outcomes. They tested the tool on an independent test set and found that PaRSnIP outperformed other state-of-the-art sequence-based methods in accuracy and Matthew's correlation coefficient. The overall accuracy of PaRSnIP was 74%, with Matthew's correlation coefficient of 0.48. Additionally, PaRSnIP provides importance scores for all features used in training. The tool also revealed that higher fractions of exposed residues associate positively with protein solubility, and tripeptide stretches with multiple histidines associate negatively with solubility.

The improved prediction accuracy of PaRSnIP can enable it to predict protein solubility with greater reliability and to screen for sequence variants with enhanced manufacturability.

Topic

Protein structure analysis;Protein properties;Protein sites, features and motifs

Detail

  • Operation: Protein solubility prediction

  • Software interface: Command-line user interface;Library

  • Language: R

  • License: MIT License

  • Cost: Free

  • Version name: -

  • Credit: The Intramural Research Program (National Institute of Allergy and Infectious Diseases, National Institutes of Health, USA).

  • Input: -

  • Output: -

  • Contact: Reda Rawi , reda.rawi@nih.gov, gwo-yu.chuang@nih.gov

  • Collection: -

  • Maturity: -

Publications

  • PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine.
  • Rawi R, et al. PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine. PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine. 2018; 34:1092-1098. doi: 10.1093/bioinformatics/btx662
  • https://doi.org/10.1093/bioinformatics/btx662
  • PMID: 29069295
  • PMC: PMC6031027

Download and documentation


< Back to DB search