Snakemake

Snakemake orchestrates reproducible and scalable bioinformatics workflows by defining computational workflows in a Python-based domain-specific language, resolving file dependencies, and executing tasks across diverse computing resources.


Key Features:

  • Python-based DSL: Provides a domain-specific language that adheres to Python syntax for defining workflow rules and file relationships.
  • Scalability: Executes the same workflow on single-core workstations up to large compute clusters without modifying the workflow definition.
  • Automatic named wildcards: Infers multiple named wildcards or variables in input and output filenames to enable dynamic specification of file names.
  • Dependency management: Manages complex data dependencies between input and output files to determine execution order and reproducibility.
  • Execution environment integration: Integrates workflow execution with available computational resources to run tasks efficiently and reliably.

Scientific Applications:

  • Complex bioinformatics analyses: Automates multi-step computational analyses in bioinformatics by encoding file relationships and processing steps.
  • Small-scale to high-throughput analyses: Supports analyses ranging from single-core exploratory runs to high-throughput production runs on compute clusters.
  • Reproducible data processing: Enables reproducible execution and management of complex data dependencies through explicit workflow definitions and dynamic file naming.

Methodology:

Workflows are defined using a Python-based domain-specific language and use automatic inference of multiple named wildcards in input/output filenames, with execution performed across single-core machines to large compute clusters.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux, Windows, Mac
Programming Languages:
Python
Added:
8/3/2017
Last Updated:
11/25/2024

Operations

Publications

Köster J, Rahmann S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics. 2012;28(19):2520-2522. doi:10.1093/bioinformatics/bts480. PMID:22908215.

Documentation

Links