PipeMEM

PipeMEM is a software framework that accelerates the DNA sequence alignment process using BWA-MEM, a popular single-node genome alignment tool, in a multi-node environment using Apache Spark. The main challenge PipeMEM addresses is the need for a scalable solution to handle the exponentially growing genome data while maintaining high speed and accuracy.

Key features and results of PipeMEM:

1. Utilizes the pipe operation in Spark to optimize BWA-MEM with lower overhead compared to existing Spark-based solutions.

2. Employ a pipeline structure and in-memory computation to accelerate the alignment process further.

3. Experiments on paired-end alignment tasks demonstrated that PipeMEM has low overhead in a multi-node environment.

4. On average, PipeMEM performed 2.27 times faster than BWASpark (an alignment tool in the Genome Analysis Toolkit) and 2.33 times faster than SparkBWA.

Topic

Sequencing;Workflows;DNA

Detail

  • Operation: Editing;Sorting;Genome alignment

  • Software interface: Command-line user interface

  • Language: Java,Python

  • License: Not stated

  • Cost: Free of charge

  • Version name: -

  • Credit: Guangdong Natural Science Foundation, National Natural Science Foundation of China.

  • Input: -

  • Output: -

  • Contact: Shoubin Dong sbdong@scut.edu.cn

  • Collection: -

  • Maturity: -

Publications

  • PipeMEM: A Framework to Speed Up BWA-MEM in Spark with Low Overhead.
  • Zhang L, et al. PipeMEM: A Framework to Speed Up BWA-MEM in Spark with Low Overhead. PipeMEM: A Framework to Speed Up BWA-MEM in Spark with Low Overhead. 2019; 10:(unknown pages). doi: 10.3390/genes10110886
  • https://doi.org/10.3390/GENES10110886
  • PMID: 31689965
  • PMC: PMC6896194

Download and documentation


< Back to DB search