bioCADDIE

bioCADDIE is a data indexing project aggregating publicly accessible biomedical datasets into a single portal to facilitate their reuse and accelerate scientific discoveries. As indexed datasets grow, retrieving relevant datasets based on researchers' queries becomes increasingly challenging.

The authors propose an information retrieval (IR) system that utilizes advanced techniques such as state-of-the-art IR models, medical named entity extraction, query expansion with deep learning-based word embeddings, and a re-ranking strategy to address this issue.

The system leverages the unstructured text data associated with each dataset, including titles and descriptions.

Topic

Medicine;Data management;Literature and language

Detail

Operation: Database search;Data retrieval;Named-entity and concept recognition
Software interface: Command-line user interface
Language: Perl
License: Not stated
Cost: Free of charge
Version name: -
Credit: National Institutes of Health (NIH).
Input: -
Output: -
Contact: Hongfang Liu liu.hongfang@mayo.edu
Collection: -
Maturity: -

Publications

Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts.
Wang Y, et al. Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts. Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts. 2017; 2017:(unknown pages). doi: 10.1093/database/bax091
https://doi.org/10.1093/DATABASE/BAX091
PMID: 31725862
PMC: PMC7243926

Download and documentation

Source: https://github.com/yanshanwang/biocaddie2016mayodata
Documentation: --
Home page: https://github.com/yanshanwang/biocaddie2016mayodata

< Back to DB search