ToxDL

ToxDL predicts protein toxicity from amino acid sequences using convolutional neural networks and domain-specific embeddings to distinguish toxic versus non-toxic proteins.


Key Features:

  • Deep Learning Architecture: A convolutional neural network (CNN) tailored for variable-length input sequences processes diverse protein sequences to extract predictive features.
  • Domain Embeddings (domain2vec): A domain2vec module generates embeddings for protein domains to capture structural and functional characteristics.
  • Toxicity Classification: An output module integrates CNN outputs and domain2vec embeddings to classify proteins as toxic or non-toxic.
  • Performance Superiority: Independent testing on animal proteins and cross-species transferability assessments involving bacterial proteins showed performance superior to homology-based methods and other machine-learning techniques.
  • Saliency Maps: Saliency maps visualize network-learned features and known toxic motifs and support directed in silico sequence modifications to alter predicted toxicity.

Scientific Applications:

  • Crop safety assessment: Predicts potential toxicity of proteins introduced into genetically engineered food crops to inform safety evaluations.
  • Therapeutic protein evaluation: Assesses toxicity risk of candidate therapeutic proteins during drug development.
  • Regulatory and research support: Provides predictive evidence for GMO development and safety regulation decision-making.

Methodology:

The model is trained on protein sequences with a CNN that extracts sequence features, domain2vec that embeds domain-specific information, an output module that integrates these representations for toxic/non-toxic classification, and saliency maps to visualize learned motifs.

Topics

Details

License:
Apache-2.0
Programming Languages:
Python
Added:
1/18/2021
Last Updated:
3/2/2021

Operations

Publications

Pan X, Zuallaert J, Wang X, Shen H, Campos EP, Marushchak DO, De Neve W. ToxDL: deep learning using primary structure and domain embeddings for assessing protein toxicity. Bioinformatics. 2020;36(21):5159-5168. doi:10.1093/bioinformatics/btaa656. PMID:32692832.

PMID: 32692832
Funding: - National Natural Science Foundation of China: 61671288, 61725302, 61903248 - Science and Technology Commission of Shanghai Municipality: 17JC1403500

Links