BioBERT

BioBERT is a domain-specific language representation model based on BERT (Bidirectional Encoder Representations from Transformers) pre-trained on large-scale biomedical corpora. It aims to address the challenges of applying advancements in natural language processing (NLP) directly to biomedical text mining, which often yields unsatisfactory results due to differences in word distribution between general domain corpora and biomedical corpora. By pre-training on biomedical corpora, BioBERT can better understand complex biomedical texts and significantly outperform BERT and previous state-of-the-art models in biomedical text mining tasks, such as named entity recognition, relation extraction, and question answering, while maintaining a similar architecture across functions.

Topic

Natural language processing;Medicine;Ontology and terminology

Detail

Operation: Relation extraction;Named-entity and concept recognition
Software interface: Command-line user interface
Language: Python
License: Not stated
Cost: Free of charge
Version name: v1.1-pubmed
Credit: National Research Foundation of Korea.
Input: -
Output: -
Contact: Jaewoo Kang kangj@korea.ac.kr
Collection: -
Maturity: -

Publications

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.
Lee J, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. 2020; 36:1234-1240. doi: 10.1093/bioinformatics/btz682
https://doi.org/10.1093/BIOINFORMATICS/BTZ682
PMID: 31501885
PMC: PMC7703786

Download and documentation

Source: https://github.com/naver/biobert-pretrained/releases/tag/v1.1-pubmed
Documentation: https://github.com/naver/biobert-pretrained/blob/master/README.md
Home page: https://github.com/naver/biobert-pretrained

< Back to DB search