bioCADDIE

bioCADDIE is a data indexing project aggregating publicly accessible biomedical datasets into a single portal to facilitate their reuse and accelerate scientific discoveries. As indexed datasets grow, retrieving relevant datasets based on researchers' queries becomes increasingly challenging.

The authors propose an information retrieval (IR) system that utilizes advanced techniques such as state-of-the-art IR models, medical named entity extraction, query expansion with deep learning-based word embeddings, and a re-ranking strategy to address this issue.

The system leverages the unstructured text data associated with each dataset, including titles and descriptions.

Topic

Medicine;Data management;Literature and language

Detail

  • Operation: Database search;Data retrieval;Named-entity and concept recognition

  • Software interface: Command-line user interface

  • Language: Perl

  • License: Not stated

  • Cost: Free of charge

  • Version name: -

  • Credit: National Institutes of Health (NIH).

  • Input: -

  • Output: -

  • Contact: Hongfang Liu liu.hongfang@mayo.edu

  • Collection: -

  • Maturity: -

Publications

  • Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts.
  • Wang Y, et al. Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts. Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts. 2017; 2017:(unknown pages). doi: 10.1093/database/bax091
  • https://doi.org/10.1093/DATABASE/BAX091
  • PMID: 31725862
  • PMC: PMC7243926

Download and documentation


< Back to DB search