SPACE

SPACE generates network-based cross-species protein embeddings from STRING protein interaction data to support protein function and subcellular localization prediction.


Key Features:

  • Network-Based Embeddings: Utilizes the STRING database containing protein interaction data for 1322 eukaryotes to produce embeddings that capture network relationships beyond sequence information.
  • Cross-Species Alignment: Creates species-specific network embeddings and aligns them using orthology relations to enable direct cross-species comparisons.
  • Complementarity to Sequence Embeddings: Network embeddings are complementary to sequence-based embeddings, with alignment informed by sequence-based orthology.
  • Validation and Performance: Validated on subcellular localization and protein function prediction by training logistic regression classifiers on aligned network and sequence embeddings, achieving performance comparable to state-of-the-art deep-learning methods.

Scientific Applications:

  • Protein Function Prediction: Supports cross-species prediction of protein function by integrating network-derived and sequence-derived embeddings.
  • Subcellular Localization Prediction: Improves cross-species prediction of protein subcellular localization using aligned network and sequence embeddings.

Methodology:

Integrates protein interaction networks from STRING with orthology information, generates species-specific network embeddings and aligns them using orthology relations, and trains logistic regression classifiers on aligned network and sequence embeddings for validation.

Topics

Details

License:
MIT
Programming Languages:
Python
Added:
10/27/2025
Last Updated:
10/27/2025

Operations

Publications

Hu D, Szklarczyk D, von Mering C, Jensen LJ. SPACE: STRING proteins as complementary embeddings. Bioinformatics. 2025;41(9). doi:10.1093/bioinformatics/btaf496. PMID:40924541. PMCID:PMC12453690.

PMID: 40924541
Funding: - Novo Nordisk Foundation: NNF14CC0001, NNF20SA0035590