bioCADDIE
bioCADDIE is a data indexing project aggregating publicly accessible biomedical datasets into a single portal to facilitate their reuse and accelerate scientific discoveries. As indexed datasets grow, retrieving relevant datasets based on researchers' queries becomes increasingly challenging.
The authors propose an information retrieval (IR) system that utilizes advanced techniques such as state-of-the-art IR models, medical named entity extraction, query expansion with deep learning-based word embeddings, and a re-ranking strategy to address this issue.
The system leverages the unstructured text data associated with each dataset, including titles and descriptions.
Topic
Medicine;Data management;Literature and language
Detail
Operation: Database search;Data retrieval;Named-entity and concept recognition
Software interface: Command-line user interface
Language: Perl
License: Not stated
Cost: Free of charge
Version name: -
Credit: National Institutes of Health (NIH).
Input: -
Output: -
Contact: Hongfang Liu liu.hongfang@mayo.edu
Collection: -
Maturity: -
Publications
- Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts.
- Wang Y, et al. Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts. Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts. 2017; 2017:(unknown pages). doi: 10.1093/database/bax091
- https://doi.org/10.1093/DATABASE/BAX091
- PMID: 31725862
- PMC: PMC7243926
Download and documentation
Source: https://github.com/yanshanwang/biocaddie2016mayodata
Documentation: --
Home page: https://github.com/yanshanwang/biocaddie2016mayodata
< Back to DB search