GeneGPT
GeneGPT integrates NCBI Web APIs with large language models to enable accurate retrieval and interpretation of genomics information.
Key Features:
- NCBI Web API integration: Uses NCBI Web APIs to access authoritative genomics databases for answers.
- In-context learning with Codex: Teaches Codex (a GPT-3 variant) to invoke APIs via in-context learning paired with an augmented decoding algorithm.
- Augmented decoding and API call execution: Implements an augmented decoding algorithm that detects and executes API calls during generation.
- API demonstrations for generalization: Employs API call demonstrations that improve task generalization compared with traditional documentation-based in-context learning.
- Multi-hop API chaining: Supports longer chains of API calls to answer complex multi-hop genomics questions, validated on the GeneHop dataset.
- Benchmark performance: Achieved an average GeneTuring score of 0.83, outperforming retrieval-augmented and biomedical LLMs (Bing 0.44, BioMedLM 0.08, BioGPT 0.04, GPT-3 0.16, ChatGPT 0.12).
- Error analysis: Identifies task-specific error types to inform future model improvements.
Scientific Applications:
- Genomics question answering: Provides precise answers to genomics queries by retrieving data from NCBI resources.
- Complex multi-hop queries: Synthesizes information across multiple API calls to resolve multi-step biomedical questions.
- LLM reliability in biomedical contexts: Reduces hallucination risk by grounding responses in authoritative NCBI data.
- Evaluation and benchmarking: Serves as a framework for assessing LLM performance on genomics benchmarks such as GeneTuring and GeneHop.
Methodology:
Trains Codex (a GPT-3 variant) to invoke NCBI Web APIs via in-context learning paired with an augmented decoding algorithm that detects and executes API calls, uses API demonstrations for generalization, and chains API calls for multi-hop question answering.
Topics
Details
- Cost:
- Free of charge
- Tool Type:
- command-line tool
- Programming Languages:
- Python
- Added:
- 5/24/2024
- Last Updated:
- 5/24/2024
Operations
Data Inputs & Outputs
Alignment
Inputs
Outputs
Publications
Jin Q, Yang Y, Chen Q, Lu Z. GeneGPT: augmenting large language models with domain tools for improved access to biomedical information. Bioinformatics. 2024;40(2). doi:10.1093/bioinformatics/btae075. PMID:38341654. PMCID:PMC10904143.