eiR

The "eiR" software tool introduces an algorithm for accelerated similarity searching and clustering of chemical compounds based on structural similarities, addressing the limitations in speed and scalability existing algorithms face. With the capacity to handle databases containing millions of compounds, eiR utilizes embedding and indexing (EI) techniques to enhance computational efficiency significantly.

The eiR framework consists of two core components: EI-Search and EI-Clustering. EI-Search is designed for ultra-fast similarity searching, embedding compounds in a high-dimensional Euclidean space and employing locality-sensitive hashing (LSH) for efficient nearest-neighbor searches. This method dramatically speeds up the search process, achieving 40-200 times faster performance than sequential search methods while maintaining high recall rates.

EI-Clustering, on the other hand, leverages EI-Search with the Jarvis-Patrick clustering method to cluster vast compound libraries efficiently. This approach significantly reduces the computational time required for clustering large datasets—from several months to just a few days—without sacrificing accuracy.

Topic

Small molecules;Structure analysis

Detail

Operation: Structural similarity search
Software interface: Command-line user interface,Library
Language: R
License: Artistic License 2.0
Cost: Free
Version name: 1.42.0
Credit: The National Science Foundation.
Input: Small molecule structure [Textual format] [Tertiary structure format], Compound name [Textual format] [Tertiary structure format], Compound identifier [Textual format] [Tertiary structure format]
Output: Small molecule structure [SMILES] [Tertiary structure format] [SQLite format] [Textual format], Database search results [SMILES] [Tertiary structure format] [SQLite format] [Textual format]
Contact: Thomas Girke thomas.girke@ucr.edu
Collection: -
Maturity: Stable

Publications

Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing.
Cao Y, et al. Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing. Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing. 2010; 26:953-9. doi: 10.1093/bioinformatics/btq067
https://doi.org/10.1093/bioinformatics/btq067
PMID: 20179075
PMC: PMC2844998

Download and documentation

Source: https://bioconductor.org/packages/release/bioc/src/contrib/eiR_1.42.0.tar.gz
Documentation: https://bioconductor.org/packages/release/bioc/manuals/eiR/man/eiR.pdf
Home page: http://bioconductor.org/packages/release/bioc/html/eiR.html
Links: https://git.bioconductor.org/packages/eiR

< Back to DB search