BioBERT

BioBERT is a domain-specific language representation model based on BERT (Bidirectional Encoder Representations from Transformers) pre-trained on large-scale biomedical corpora. It aims to address the challenges of applying advancements in natural language processing (NLP) directly to biomedical text mining, which often yields unsatisfactory results due to differences in word distribution between general domain corpora and biomedical corpora. By pre-training on biomedical corpora, BioBERT can better understand complex biomedical texts and significantly outperform BERT and previous state-of-the-art models in biomedical text mining tasks, such as named entity recognition, relation extraction, and question answering, while maintaining a similar architecture across functions.

Topic

Natural language processing;Medicine;Ontology and terminology

Detail

  • Operation: Relation extraction;Named-entity and concept recognition

  • Software interface: Command-line user interface

  • Language: Python

  • License: Not stated

  • Cost: Free of charge

  • Version name: v1.1-pubmed

  • Credit: National Research Foundation of Korea.

  • Input: -

  • Output: -

  • Contact: Jaewoo Kang kangj@korea.ac.kr

  • Collection: -

  • Maturity: -

Publications

  • BioBERT: a pre-trained biomedical language representation model for biomedical text mining.
  • Lee J, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. 2020; 36:1234-1240. doi: 10.1093/bioinformatics/btz682
  • https://doi.org/10.1093/BIOINFORMATICS/BTZ682
  • PMID: 31501885
  • PMC: PMC7703786

Download and documentation


< Back to DB search