PolySearch

PolySearch performs associative text-mining queries across biomedical literature and bioinformatic databases to identify and rank associations among entities such as diseases, tissues, cell compartments, gene/protein names, SNPs, mutations, drugs, and metabolites for genomics, proteomics, and metabolomics research.


Key Features:

  • Associative query formulation: Supports "Given X, find all Y" queries where X or Y can be diseases, tissues, cell compartments, gene/protein names, SNPs, mutations, drugs, or metabolites.
  • Query breadth: Supports over 50 different classes of queries across nearly a dozen types of text sources, including scientific abstracts and bioinformatic databases.
  • Text mining and information retrieval: Employs advanced text-mining and information-retrieval techniques to identify, highlight, and rank relevant abstracts, paragraphs, and sentences.
  • Output granularity: Identifies and ranks information at the abstract, paragraph, and sentence levels.
  • Performance benchmarking: Evaluated on gene synonym identification, protein-protein interaction identification, and disease gene identification using manually assembled gold-standard text corpuses, achieving f-measures of 88%, 81%, and 79%, respectively, with reported improvements of 5%–50% over other published tools.
  • Omics applicability: Applicable to genomics, proteomics, and metabolomics research contexts.

Scientific Applications:

  • Associative discovery: Uncovers associations among genes, proteins, metabolites, diseases, drugs, SNPs, and mutations from literature and databases.
  • Gene synonym identification: Identifies gene synonyms within text corpuses.
  • Protein-protein interaction identification: Extracts mentions of protein-protein interactions from scientific text.
  • Disease gene identification: Identifies gene-disease associations from literature evidence.
  • Cross-entity relationship mining: Enables discovery of relationships such as drug–gene, metabolite–disease, and SNP–phenotype associations.

Methodology:

Uses associative query formulation combined with advanced text-mining and information-retrieval methods to identify, highlight, and rank relevant abstracts, paragraphs, and sentences; benchmarking was performed against manually assembled gold-standard text corpuses with f-measure reporting.

Topics

Details

Tool Type:
web application
Operating Systems:
Linux, Windows, Mac
Programming Languages:
Perl
Added:
3/24/2017
Last Updated:
12/10/2018

Operations

Publications

Cheng D, et al. PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Res. 2008; 36:W399-405. doi: 10.1093/nar/gkn296

PMID: 18487273

Documentation