eiR

The "eiR" software tool introduces an algorithm for accelerated similarity searching and clustering of chemical compounds based on structural similarities, addressing the limitations in speed and scalability existing algorithms face. With the capacity to handle databases containing millions of compounds, eiR utilizes embedding and indexing (EI) techniques to enhance computational efficiency significantly.

The eiR framework consists of two core components: EI-Search and EI-Clustering. EI-Search is designed for ultra-fast similarity searching, embedding compounds in a high-dimensional Euclidean space and employing locality-sensitive hashing (LSH) for efficient nearest-neighbor searches. This method dramatically speeds up the search process, achieving 40-200 times faster performance than sequential search methods while maintaining high recall rates.

EI-Clustering, on the other hand, leverages EI-Search with the Jarvis-Patrick clustering method to cluster vast compound libraries efficiently. This approach significantly reduces the computational time required for clustering large datasets—from several months to just a few days—without sacrificing accuracy.

Topic

Small molecules;Structure analysis

Detail

  • Operation: Structural similarity search

  • Software interface: Command-line user interface,Library

  • Language: R

  • License: Artistic License 2.0

  • Cost: Free

  • Version name: 1.42.0

  • Credit: The National Science Foundation.

  • Input: Small molecule structure [Textual format] [Tertiary structure format], Compound name [Textual format] [Tertiary structure format], Compound identifier [Textual format] [Tertiary structure format]

  • Output: Small molecule structure [SMILES] [Tertiary structure format] [SQLite format] [Textual format], Database search results [SMILES] [Tertiary structure format] [SQLite format] [Textual format]

  • Contact: Thomas Girke thomas.girke@ucr.edu

  • Collection: -

  • Maturity: Stable

Publications

  • Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing.
  • Cao Y, et al. Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing. Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing. 2010; 26:953-9. doi: 10.1093/bioinformatics/btq067
  • https://doi.org/10.1093/bioinformatics/btq067
  • PMID: 20179075
  • PMC: PMC2844998

Download and documentation


< Back to DB search