PolySearch
PolySearch performs associative text-mining queries across biomedical literature and bioinformatic databases to identify and rank associations among entities such as diseases, tissues, cell compartments, gene/protein names, SNPs, mutations, drugs, and metabolites for genomics, proteomics, and metabolomics research.
Key Features:
- Associative query formulation: Supports "Given X, find all Y" queries where X or Y can be diseases, tissues, cell compartments, gene/protein names, SNPs, mutations, drugs, or metabolites.
- Query breadth: Supports over 50 different classes of queries across nearly a dozen types of text sources, including scientific abstracts and bioinformatic databases.
- Text mining and information retrieval: Employs advanced text-mining and information-retrieval techniques to identify, highlight, and rank relevant abstracts, paragraphs, and sentences.
- Output granularity: Identifies and ranks information at the abstract, paragraph, and sentence levels.
- Performance benchmarking: Evaluated on gene synonym identification, protein-protein interaction identification, and disease gene identification using manually assembled gold-standard text corpuses, achieving f-measures of 88%, 81%, and 79%, respectively, with reported improvements of 5%–50% over other published tools.
- Omics applicability: Applicable to genomics, proteomics, and metabolomics research contexts.
Scientific Applications:
- Associative discovery: Uncovers associations among genes, proteins, metabolites, diseases, drugs, SNPs, and mutations from literature and databases.
- Gene synonym identification: Identifies gene synonyms within text corpuses.
- Protein-protein interaction identification: Extracts mentions of protein-protein interactions from scientific text.
- Disease gene identification: Identifies gene-disease associations from literature evidence.
- Cross-entity relationship mining: Enables discovery of relationships such as drug–gene, metabolite–disease, and SNP–phenotype associations.
Methodology:
Uses associative query formulation combined with advanced text-mining and information-retrieval methods to identify, highlight, and rank relevant abstracts, paragraphs, and sentences; benchmarking was performed against manually assembled gold-standard text corpuses with f-measure reporting.
Topics
Details
- Tool Type:
- web application
- Operating Systems:
- Linux, Windows, Mac
- Programming Languages:
- Perl
- Added:
- 3/24/2017
- Last Updated:
- 12/10/2018
Operations
Publications
Cheng D, et al. PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Res. 2008; 36:W399-405. doi: 10.1093/nar/gkn296