BioBERT
BioBERT is a domain-specific language representation model based on BERT (Bidirectional Encoder Representations from Transformers) pre-trained on large-scale biomedical corpora. It aims to address the challenges of applying advancements in natural language processing (NLP) directly to biomedical text mining, which often yields unsatisfactory results due to differences in word distribution between general domain corpora and biomedical corpora. By pre-training on biomedical corpora, BioBERT can better understand complex biomedical texts and significantly outperform BERT and previous state-of-the-art models in biomedical text mining tasks, such as named entity recognition, relation extraction, and question answering, while maintaining a similar architecture across functions.
Topic
Natural language processing;Medicine;Ontology and terminology
Detail
Operation: Relation extraction;Named-entity and concept recognition
Software interface: Command-line user interface
Language: Python
License: Not stated
Cost: Free of charge
Version name: v1.1-pubmed
Credit: National Research Foundation of Korea.
Input: -
Output: -
Contact: Jaewoo Kang kangj@korea.ac.kr
Collection: -
Maturity: -
Publications
- BioBERT: a pre-trained biomedical language representation model for biomedical text mining.
- Lee J, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. 2020; 36:1234-1240. doi: 10.1093/bioinformatics/btz682
- https://doi.org/10.1093/BIOINFORMATICS/BTZ682
- PMID: 31501885
- PMC: PMC7703786
Download and documentation
Source: https://github.com/naver/biobert-pretrained/releases/tag/v1.1-pubmed
Documentation: https://github.com/naver/biobert-pretrained/blob/master/README.md
Home page: https://github.com/naver/biobert-pretrained
< Back to DB search