Orchid
Orchid performs management, annotation, and machine-learning analysis of whole-genome tumor mutation datasets in Python to enable identification and classification of tissue-specific somatic mutations.
Key Features:
- Data Management: Manages extensive whole-genome tumor sequence datasets and biological annotations and supports MySQL and MemSQL for robust storage.
- Annotation Capabilities: Integrates diverse biological datasets to produce an annotated tumor mutation database that enhances interpretability of cancer mutations.
- Machine Learning Integration: Incorporates machine learning analytics, including a random forest classifier that distinguishes tissue origins across 12 tumor types using 339 features.
- Parallel Workflow Execution: Executes parallel workflows using Groovy 2.4.5 to enable concurrent task processing.
- In-Memory Database Storage: Utilizes in-memory database storage to accelerate data retrieval and manipulation for large mutation collections.
Scientific Applications:
- Cancer genomics research: Enables large-scale analysis of somatic mutations across tumor genomes to support discovery and characterization of cancer-associated variants.
- Tissue-of-origin classification: Supports classification of tumor tissue origin based on mutational feature patterns across 12 tumor types.
- Translational studies and precision oncology: Provides mutation annotation and predictive modeling resources that can inform oncological studies and potential personalized-medicine approaches.
Methodology:
Implemented in Python; uses Groovy 2.4.5 for parallel workflow execution; employs MySQL and MemSQL (in-memory database) for storage and retrieval; integrates machine learning analytics including a random forest classifier using 339 features to distinguish tissue origins across 12 tumor types.
Topics
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Linux
- Programming Languages:
- Python
- Added:
- 6/23/2018
- Last Updated:
- 11/25/2024
Operations
Publications
Cario CL, Witte JS. Orchid: a novel management, annotation and machine learning framework for analyzing cancer mutations. Bioinformatics. 2017;34(6):936-942. doi:10.1093/bioinformatics/btx709. PMID:29106441. PMCID:PMC5860353.