Snakemake
Snakemake orchestrates reproducible and scalable bioinformatics workflows by defining computational workflows in a Python-based domain-specific language, resolving file dependencies, and executing tasks across diverse computing resources.
Key Features:
- Python-based DSL: Provides a domain-specific language that adheres to Python syntax for defining workflow rules and file relationships.
- Scalability: Executes the same workflow on single-core workstations up to large compute clusters without modifying the workflow definition.
- Automatic named wildcards: Infers multiple named wildcards or variables in input and output filenames to enable dynamic specification of file names.
- Dependency management: Manages complex data dependencies between input and output files to determine execution order and reproducibility.
- Execution environment integration: Integrates workflow execution with available computational resources to run tasks efficiently and reliably.
Scientific Applications:
- Complex bioinformatics analyses: Automates multi-step computational analyses in bioinformatics by encoding file relationships and processing steps.
- Small-scale to high-throughput analyses: Supports analyses ranging from single-core exploratory runs to high-throughput production runs on compute clusters.
- Reproducible data processing: Enables reproducible execution and management of complex data dependencies through explicit workflow definitions and dynamic file naming.
Methodology:
Workflows are defined using a Python-based domain-specific language and use automatic inference of multiple named wildcards in input/output filenames, with execution performed across single-core machines to large compute clusters.
Topics
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Linux, Windows, Mac
- Programming Languages:
- Python
- Added:
- 8/3/2017
- Last Updated:
- 11/25/2024
Operations
Publications
Köster J, Rahmann S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics. 2012;28(19):2520-2522. doi:10.1093/bioinformatics/bts480. PMID:22908215.
PMID: 22908215