PipeMEM
PipeMEM is a software framework that accelerates the DNA sequence alignment process using BWA-MEM, a popular single-node genome alignment tool, in a multi-node environment using Apache Spark. The main challenge PipeMEM addresses is the need for a scalable solution to handle the exponentially growing genome data while maintaining high speed and accuracy.
Key features and results of PipeMEM:
1. Utilizes the pipe operation in Spark to optimize BWA-MEM with lower overhead compared to existing Spark-based solutions.
2. Employ a pipeline structure and in-memory computation to accelerate the alignment process further.
3. Experiments on paired-end alignment tasks demonstrated that PipeMEM has low overhead in a multi-node environment.
4. On average, PipeMEM performed 2.27 times faster than BWASpark (an alignment tool in the Genome Analysis Toolkit) and 2.33 times faster than SparkBWA.
Topic
Sequencing;Workflows;DNA
Detail
Operation: Editing;Sorting;Genome alignment
Software interface: Command-line user interface
Language: Java,Python
License: Not stated
Cost: Free of charge
Version name: -
Credit: Guangdong Natural Science Foundation, National Natural Science Foundation of China.
Input: -
Output: -
Contact: Shoubin Dong sbdong@scut.edu.cn
Collection: -
Maturity: -
Publications
- PipeMEM: A Framework to Speed Up BWA-MEM in Spark with Low Overhead.
- Zhang L, et al. PipeMEM: A Framework to Speed Up BWA-MEM in Spark with Low Overhead. PipeMEM: A Framework to Speed Up BWA-MEM in Spark with Low Overhead. 2019; 10:(unknown pages). doi: 10.3390/genes10110886
- https://doi.org/10.3390/GENES10110886
- PMID: 31689965
- PMC: PMC6896194
Download and documentation
Documentation: https://github.com/SCUT-CCNL/PipeMEM/blob/master/README.md
Home page: https://github.com/SCUT-CCNL/PipeMEM
< Back to DB search