NetSolP
NetSolP predicts protein solubility and usability for Escherichia coli expression systems directly from amino acid sequences.
Key Features:
- Deep Learning Architecture: Uses transformer-based protein language models to learn sequence-derived representations for solubility and usability prediction.
- Performance: Demonstrates high predictive accuracy compared to existing sequence-based methods.
- Bias Mitigation in Datasets: Employs strict sequence-identity partitioning during dataset curation to reduce dataset bias.
- Extrapolation Across Datasets: Shows improved extrapolation capabilities when applied to datasets distinct from the training data.
Scientific Applications:
- Optimizing Protein Expression: Enables selection or engineering of sequences with higher predicted solubility for Escherichia coli expression.
- Reducing Experimental Failures: Identifies sequences likely to produce insoluble or unusable protein to prioritize experimental efforts.
- Facilitating Protein Engineering: Guides sequence modification strategies aimed at improving solubility and usability.
Methodology:
NetSolP employs transformer-based protein language models trained on curated datasets with strict sequence-identity partitioning to process amino acid sequences and predict solubility and usability.
Topics
Details
- Cost:
- Free of charge
- Tool Type:
- web application
- Operating Systems:
- Mac, Linux, Windows
- Added:
- 5/19/2022
- Last Updated:
- 11/24/2024
Operations
Publications
Thumuluri V, Martiny H, Almagro Armenteros JJ, Salomon J, Nielsen H, Johansen AR. NetSolP: predicting protein solubility in <i>Escherichia coli</i> using language models. Bioinformatics. 2021;38(4):941-946. doi:10.1093/bioinformatics/btab801. PMID:35088833.
PMID: 35088833