RanDepict

RanDepict generates diverse chemical structure depictions to create varied datasets for training deep learning-based optical chemical structure recognition (OCSR) systems.


Key Features:

  • Multi-library rendering: Uses depiction functionalities from RDKit, CDK, Indigo, and PIKAChU to produce chemical structure images.
  • Diverse depiction parameters: Controls parameters such as bond length, line thickness, and label font style to vary depiction appearance.
  • Image augmentation capabilities: Adds features such as curved arrows, chemical labels around structures, and geometric distortions to increase image variability.
  • Depiction feature fingerprints: Encodes depiction and augmentation characteristics as binary vectors summarizing image features.
  • MaxMin algorithm for diversity: Applies the MaxMin algorithm to select diverse samples from valid depiction and augmentation options.

Scientific Applications:

  • OCSR dataset generation: Produces diverse and augmented chemical structure images for training optical chemical structure recognition systems.
  • Model robustness and benchmarking: Supplies varied depiction styles and augmentations for testing and benchmarking deep learning-based chemical structure recognition models.

Methodology:

Leverages depiction functions from RDKit, CDK, Indigo, and PIKAChU; varies depiction parameters and applies image augmentations; represents images with binary depiction feature fingerprints and selects diverse samples using the MaxMin algorithm.

Topics

Details

License:
MIT
Cost:
Free of charge
Tool Type:
library
Operating Systems:
Mac, Linux, Windows
Programming Languages:
Python
Added:
9/3/2022
Last Updated:
11/24/2024

Operations

Publications

Brinkhaus HO, Rajan K, Zielesny A, Steinbeck C. RanDepict: Random chemical structure depiction generator. Journal of Cheminformatics. 2022;14(1). doi:10.1186/s13321-022-00609-4. PMID:35668480. PMCID:PMC9169273.

PMID: 35668480
PMCID: PMC9169273
Funding: - ChemBioSys: CRC1127

Links