Orchid

Orchid performs management, annotation, and machine-learning analysis of whole-genome tumor mutation datasets in Python to enable identification and classification of tissue-specific somatic mutations.


Key Features:

  • Data Management: Manages extensive whole-genome tumor sequence datasets and biological annotations and supports MySQL and MemSQL for robust storage.
  • Annotation Capabilities: Integrates diverse biological datasets to produce an annotated tumor mutation database that enhances interpretability of cancer mutations.
  • Machine Learning Integration: Incorporates machine learning analytics, including a random forest classifier that distinguishes tissue origins across 12 tumor types using 339 features.
  • Parallel Workflow Execution: Executes parallel workflows using Groovy 2.4.5 to enable concurrent task processing.
  • In-Memory Database Storage: Utilizes in-memory database storage to accelerate data retrieval and manipulation for large mutation collections.

Scientific Applications:

  • Cancer genomics research: Enables large-scale analysis of somatic mutations across tumor genomes to support discovery and characterization of cancer-associated variants.
  • Tissue-of-origin classification: Supports classification of tumor tissue origin based on mutational feature patterns across 12 tumor types.
  • Translational studies and precision oncology: Provides mutation annotation and predictive modeling resources that can inform oncological studies and potential personalized-medicine approaches.

Methodology:

Implemented in Python; uses Groovy 2.4.5 for parallel workflow execution; employs MySQL and MemSQL (in-memory database) for storage and retrieval; integrates machine learning analytics including a random forest classifier using 339 features to distinguish tissue origins across 12 tumor types.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux
Programming Languages:
Python
Added:
6/23/2018
Last Updated:
11/25/2024

Operations

Publications

Cario CL, Witte JS. Orchid: a novel management, annotation and machine learning framework for analyzing cancer mutations. Bioinformatics. 2017;34(6):936-942. doi:10.1093/bioinformatics/btx709. PMID:29106441. PMCID:PMC5860353.

PMID: 29106441
PMCID: PMC5860353
Funding: - National Institutes of Health: CA088164 and CA201358

Documentation