BiSpark

BiSpark accelerates alignment of bisulfite-treated sequencing reads (produced by sodium bisulfite conversion of unmethylated cytosines to uracil while leaving methylated cytosines unchanged) using the Apache Spark distributed framework for genome-wide DNA methylation analysis.


Key Features:

  • Bisulfite-aware alignment: Performs alignment tailored for bisulfite-treated sequencing reads that accounts for sodium bisulfite–induced C→U conversions.
  • Apache Spark-based parallelism: Implements processing on the memory-optimized Apache Spark distributed data processing system to maximize parallel efficiency across multiple computing nodes.
  • Dynamic load redistribution: Redistributes imbalanced data loads dynamically within the distributed environment to minimize delays and improve throughput on large datasets.
  • Scalability and speed: Achieves increased alignment speed and scalability on large datasets compared with other bisulfite sequencing aligners while maintaining consistent and accurate mapping results.

Scientific Applications:

  • Genome-wide DNA methylation mapping: Enables high-resolution DNA methylome analysis from bisulfite sequencing data.
  • Large-scale bisulfite sequencing studies: Supports population-scale or high-throughput projects that require distributed alignment of bisulfite-treated reads.
  • Epigenetics research: Facilitates detection and interpretation of cytosine methylation patterns derived from sodium bisulfite-treated sequencing.

Methodology:

Leverages the Apache Spark memory-optimized distributed data processing framework to perform bisulfite-aware alignment and dynamically redistribute imbalanced data loads across computing nodes.

Topics

Details

License:
Unlicense
Maturity:
Mature
Cost:
Free of charge
Tool Type:
command-line tool
Operating Systems:
Linux, Mac
Programming Languages:
Python
Added:
8/11/2019
Last Updated:
6/16/2020

Operations

Publications

Soe S, Park Y, Chae H. BiSpark: a Spark-based highly scalable aligner for bisulfite sequencing data. BMC Bioinformatics. 2018;19(1). doi:10.1186/s12859-018-2498-2. PMID:30526492. PMCID:PMC6288881.

PMID: 30526492
PMCID: PMC6288881
Funding: - Ministry of Science ICT and Future Planning: 2017R1C1B5018165 - National Research Foundation of Korea: NRF-2016R1D1A1A02937186 - Sookmyung Women's University: 1-1703-2032 - Korea Health Industry Development Institute: HI15C3224

Documentation

Links