GeneGPT

GeneGPT integrates NCBI Web APIs with large language models to enable accurate retrieval and interpretation of genomics information.


Key Features:

  • NCBI Web API integration: Uses NCBI Web APIs to access authoritative genomics databases for answers.
  • In-context learning with Codex: Teaches Codex (a GPT-3 variant) to invoke APIs via in-context learning paired with an augmented decoding algorithm.
  • Augmented decoding and API call execution: Implements an augmented decoding algorithm that detects and executes API calls during generation.
  • API demonstrations for generalization: Employs API call demonstrations that improve task generalization compared with traditional documentation-based in-context learning.
  • Multi-hop API chaining: Supports longer chains of API calls to answer complex multi-hop genomics questions, validated on the GeneHop dataset.
  • Benchmark performance: Achieved an average GeneTuring score of 0.83, outperforming retrieval-augmented and biomedical LLMs (Bing 0.44, BioMedLM 0.08, BioGPT 0.04, GPT-3 0.16, ChatGPT 0.12).
  • Error analysis: Identifies task-specific error types to inform future model improvements.

Scientific Applications:

  • Genomics question answering: Provides precise answers to genomics queries by retrieving data from NCBI resources.
  • Complex multi-hop queries: Synthesizes information across multiple API calls to resolve multi-step biomedical questions.
  • LLM reliability in biomedical contexts: Reduces hallucination risk by grounding responses in authoritative NCBI data.
  • Evaluation and benchmarking: Serves as a framework for assessing LLM performance on genomics benchmarks such as GeneTuring and GeneHop.

Methodology:

Trains Codex (a GPT-3 variant) to invoke NCBI Web APIs via in-context learning paired with an augmented decoding algorithm that detects and executes API calls, uses API demonstrations for generalization, and chains API calls for multi-hop question answering.

Topics

Details

Cost:
Free of charge
Tool Type:
command-line tool
Programming Languages:
Python
Added:
5/24/2024
Last Updated:
5/24/2024

Operations

Data Inputs & Outputs

Alignment

Publications

Jin Q, Yang Y, Chen Q, Lu Z. GeneGPT: augmenting large language models with domain tools for improved access to biomedical information. Bioinformatics. 2024;40(2). doi:10.1093/bioinformatics/btae075. PMID:38341654. PMCID:PMC10904143.