BioBERT
BioBERT provides pre-trained language representations specialized for biomedical text mining to improve extraction of biomedical named entities, relations, and answers from the biomedical literature.
Key Features:
- BERT-based architecture: Built upon BERT (Bidirectional Encoder Representations from Transformers) to leverage bidirectional transformer representations.
- Domain-specific pre-training: Pre-trained on large-scale biomedical corpora to address word distribution shifts between general and biomedical text.
- Pre-trained language representation model: Supplies representations optimized for biomedical vocabulary and semantics.
- Fine-tuning for downstream tasks: Supports fine-tuning for task-specific models such as named entity recognition, relation extraction, and question answering.
- Empirical performance gains: Reports a 0.62% F1 increase in named entity recognition, a 2.80% F1 increase in relation extraction, and a 12.24% improvement in mean reciprocal rank (MRR) for question answering versus BERT/prior state-of-the-art.
- Consistent architecture across tasks: Uses the same underlying model architecture without task-specific structural modifications.
Scientific Applications:
- Biomedical named entity recognition: Identifies biomedical entities in text with improved F1 performance (+0.62% reported).
- Relation extraction: Extracts relations between biomedical entities with improved F1 performance (+2.80% reported).
- Question answering: Retrieves and ranks answers to biomedical questions with enhanced mean reciprocal rank (+12.24% MRR reported).
Methodology:
Pre-training on large-scale biomedical corpora using the BERT architecture followed by task-specific fine-tuning.
Topics
Details
- Programming Languages:
- Python
- Added:
- 11/14/2019
- Last Updated:
- 11/24/2024
Operations
Publications
Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2019;36(4):1234-1240. doi:10.1093/bioinformatics/btz682. PMID:31501885. PMCID:PMC7703786.
PMID: 31501885
PMCID: PMC7703786
Funding: - National Research Foundation of Korea(NRF) funded by the Korea government: NRF-2014M3C9A3063541, NRF-2017M3C4A7065887, NRF-2017R1A2A1A17069645
Links
Repository
https://github.com/dmis-lab/biobert