SPACE
SPACE generates network-based cross-species protein embeddings from STRING protein interaction data to support protein function and subcellular localization prediction.
Key Features:
- Network-Based Embeddings: Utilizes the STRING database containing protein interaction data for 1322 eukaryotes to produce embeddings that capture network relationships beyond sequence information.
- Cross-Species Alignment: Creates species-specific network embeddings and aligns them using orthology relations to enable direct cross-species comparisons.
- Complementarity to Sequence Embeddings: Network embeddings are complementary to sequence-based embeddings, with alignment informed by sequence-based orthology.
- Validation and Performance: Validated on subcellular localization and protein function prediction by training logistic regression classifiers on aligned network and sequence embeddings, achieving performance comparable to state-of-the-art deep-learning methods.
Scientific Applications:
- Protein Function Prediction: Supports cross-species prediction of protein function by integrating network-derived and sequence-derived embeddings.
- Subcellular Localization Prediction: Improves cross-species prediction of protein subcellular localization using aligned network and sequence embeddings.
Methodology:
Integrates protein interaction networks from STRING with orthology information, generates species-specific network embeddings and aligns them using orthology relations, and trains logistic regression classifiers on aligned network and sequence embeddings for validation.
Topics
Details
- License:
- MIT
- Programming Languages:
- Python
- Added:
- 10/27/2025
- Last Updated:
- 10/27/2025
Operations
Publications
Hu D, Szklarczyk D, von Mering C, Jensen LJ. SPACE: STRING proteins as complementary embeddings. Bioinformatics. 2025;41(9). doi:10.1093/bioinformatics/btaf496. PMID:40924541. PMCID:PMC12453690.