BiSpark

BiSpark accelerates alignment of bisulfite-treated sequencing reads (produced by sodium bisulfite conversion of unmethylated cytosines to uracil while leaving methylated cytosines unchanged) using the Apache Spark distributed framework for genome-wide DNA methylation analysis.

Key Features:

Bisulfite-aware alignment: Performs alignment tailored for bisulfite-treated sequencing reads that accounts for sodium bisulfite–induced C→U conversions.
Apache Spark-based parallelism: Implements processing on the memory-optimized Apache Spark distributed data processing system to maximize parallel efficiency across multiple computing nodes.
Dynamic load redistribution: Redistributes imbalanced data loads dynamically within the distributed environment to minimize delays and improve throughput on large datasets.
Scalability and speed: Achieves increased alignment speed and scalability on large datasets compared with other bisulfite sequencing aligners while maintaining consistent and accurate mapping results.

Scientific Applications:

Genome-wide DNA methylation mapping: Enables high-resolution DNA methylome analysis from bisulfite sequencing data.
Large-scale bisulfite sequencing studies: Supports population-scale or high-throughput projects that require distributed alignment of bisulfite-treated reads.
Epigenetics research: Facilitates detection and interpretation of cytosine methylation patterns derived from sodium bisulfite-treated sequencing.

Methodology:

Leverages the Apache Spark memory-optimized distributed data processing framework to perform bisulfite-aware alignment and dynamically redistribute imbalanced data loads across computing nodes.

Visit Official Homepage →

Topics

Methylated DNA immunoprecipitation Epigenetics DNA

Details

License:: Unlicense
Maturity:: Mature
Cost:: Free of charge
Tool Type:: command-line tool
Operating Systems:: Linux, Mac
Programming Languages:: Python
Added:: 8/11/2019
Last Updated:: 6/16/2020

Operations

Publications

Soe S, Park Y, Chae H. BiSpark: a Spark-based highly scalable aligner for bisulfite sequencing data. BMC Bioinformatics. 2018;19(1). doi:10.1186/s12859-018-2498-2. PMID:30526492. PMCID:PMC6288881.

DOI: 10.1186/s12859-018-2498-2

PMID: 30526492

PMCID: PMC6288881

Funding: - Ministry of Science ICT and Future Planning: 2017R1C1B5018165 - National Research Foundation of Korea: NRF-2016R1D1A1A02937186 - Sookmyung Women's University: 1-1703-2032 - Korea Health Industry Development Institute: HI15C3224

Documentation

General

https://bhi-kimlab.github.io/BiSpark/

Links

Issue tracker

https://github.com/bhi-kimlab/BiSpark/issues

← Back to search