OpenFold

OpenFold implements a fast, memory-efficient reimplementation of AlphaFold2 for predicting protein structures and enabling retraining and analysis of model learning.


Key Features:

  • AlphaFold2 reimplementation: Provides a codebase that reproduces AlphaFold2's architecture and inference behavior.
  • Fast and memory-efficient: Offers performance optimizations targeting reduced runtime and memory usage during model execution.
  • Trainable framework: Permits training models from scratch with access to the training code and workflows.
  • Accessible training data and code: Addresses the absence of publicly available training code and data required for developing new models.
  • Comparable accuracy: Achieves accuracy levels reported to be similar to those of AlphaFold2 for protein structure prediction.
  • Robust generalization: Demonstrates the ability to generalize protein structures when trained on limited datasets and when entire classes of secondary structure elements are omitted.
  • Hierarchical learning insights: Exposes intermediate structures generated during training to analyze the progressive learning of complex folds.
  • Supports novel tasks and evaluation: Enables exploration of tasks such as protein-ligand complex prediction and evaluation of generalization across uncharted regions of fold space.

Scientific Applications:

  • Protein structure prediction: Predicts tertiary structures of proteins using an AlphaFold2-derived model implementation.
  • Retraining for novel tasks: Facilitates retraining to adapt models for tasks such as protein-ligand complex structure prediction.
  • Model learning analysis: Allows study of hierarchical and intermediate learning behavior by inspecting structures generated during training.
  • Generalization assessment: Enables evaluation of model generalization across fold space and under limited or biased training data.
  • Investigating secondary structure effects: Supports experiments that omit entire classes of secondary structure elements to assess their impact on learning and prediction.

Methodology:

Reimplementation of the AlphaFold2 architecture with a trainable framework enabling training models from scratch and analysis of intermediate structures produced during training.

Topics

Details

License:
Apache-2.0
Cost:
Free of charge
Tool Type:
command-line tool
Operating Systems:
Linux
Programming Languages:
Python
Added:
6/19/2024
Last Updated:
11/24/2024

Operations

Publications

Ahdritz G, Bouatta N, Floristean C, Kadyan S, Xia Q, Gerecke W, O’Donnell TJ, Berenberg D, Fisk I, Zanichelli N, Zhang B, Nowaczynski A, Wang B, Stepniewska-Dziubinska MM, Zhang S, Ojewole A, Guney ME, Biderman S, Watkins AM, Ra S, Lorenzo PR, Nivon L, Weitzner B, Ban YA, Chen S, Zhang M, Li C, Song SL, He Y, Sorger PK, Mostaque E, Zhang Z, Bonneau R, AlQuraishi M. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Nature Methods. 2024;21(8):1514-1524. doi:10.1038/s41592-024-02272-z. PMID:38744917. PMCID:PMC11645889.

PMID: 38744917
Funding: - U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences: R35GM150546 - U.S. Department of Health & Human Services | NIH | National Cancer Institute: U54-CA225088 - National Science Foundation: OAC-2106661, OAC-2112606

Documentation