ToxDL
ToxDL predicts protein toxicity from amino acid sequences using convolutional neural networks and domain-specific embeddings to distinguish toxic versus non-toxic proteins.
Key Features:
- Deep Learning Architecture: A convolutional neural network (CNN) tailored for variable-length input sequences processes diverse protein sequences to extract predictive features.
- Domain Embeddings (domain2vec): A domain2vec module generates embeddings for protein domains to capture structural and functional characteristics.
- Toxicity Classification: An output module integrates CNN outputs and domain2vec embeddings to classify proteins as toxic or non-toxic.
- Performance Superiority: Independent testing on animal proteins and cross-species transferability assessments involving bacterial proteins showed performance superior to homology-based methods and other machine-learning techniques.
- Saliency Maps: Saliency maps visualize network-learned features and known toxic motifs and support directed in silico sequence modifications to alter predicted toxicity.
Scientific Applications:
- Crop safety assessment: Predicts potential toxicity of proteins introduced into genetically engineered food crops to inform safety evaluations.
- Therapeutic protein evaluation: Assesses toxicity risk of candidate therapeutic proteins during drug development.
- Regulatory and research support: Provides predictive evidence for GMO development and safety regulation decision-making.
Methodology:
The model is trained on protein sequences with a CNN that extracts sequence features, domain2vec that embeds domain-specific information, an output module that integrates these representations for toxic/non-toxic classification, and saliency maps to visualize learned motifs.
Topics
Details
- License:
- Apache-2.0
- Programming Languages:
- Python
- Added:
- 1/18/2021
- Last Updated:
- 3/2/2021
Operations
Publications
Pan X, Zuallaert J, Wang X, Shen H, Campos EP, Marushchak DO, De Neve W. ToxDL: deep learning using primary structure and domain embeddings for assessing protein toxicity. Bioinformatics. 2020;36(21):5159-5168. doi:10.1093/bioinformatics/btaa656. PMID:32692832.