NetSolP

NetSolP predicts protein solubility and usability for Escherichia coli expression systems directly from amino acid sequences.


Key Features:

  • Deep Learning Architecture: Uses transformer-based protein language models to learn sequence-derived representations for solubility and usability prediction.
  • Performance: Demonstrates high predictive accuracy compared to existing sequence-based methods.
  • Bias Mitigation in Datasets: Employs strict sequence-identity partitioning during dataset curation to reduce dataset bias.
  • Extrapolation Across Datasets: Shows improved extrapolation capabilities when applied to datasets distinct from the training data.

Scientific Applications:

  • Optimizing Protein Expression: Enables selection or engineering of sequences with higher predicted solubility for Escherichia coli expression.
  • Reducing Experimental Failures: Identifies sequences likely to produce insoluble or unusable protein to prioritize experimental efforts.
  • Facilitating Protein Engineering: Guides sequence modification strategies aimed at improving solubility and usability.

Methodology:

NetSolP employs transformer-based protein language models trained on curated datasets with strict sequence-identity partitioning to process amino acid sequences and predict solubility and usability.

Topics

Details

Cost:
Free of charge
Tool Type:
web application
Operating Systems:
Mac, Linux, Windows
Added:
5/19/2022
Last Updated:
11/24/2024

Operations

Publications

Thumuluri V, Martiny H, Almagro Armenteros JJ, Salomon J, Nielsen H, Johansen AR. NetSolP: predicting protein solubility in <i>Escherichia coli</i> using language models. Bioinformatics. 2021;38(4):941-946. doi:10.1093/bioinformatics/btab801. PMID:35088833.