BiSpark
BiSpark accelerates alignment of bisulfite-treated sequencing reads (produced by sodium bisulfite conversion of unmethylated cytosines to uracil while leaving methylated cytosines unchanged) using the Apache Spark distributed framework for genome-wide DNA methylation analysis.
Key Features:
- Bisulfite-aware alignment: Performs alignment tailored for bisulfite-treated sequencing reads that accounts for sodium bisulfite–induced C→U conversions.
- Apache Spark-based parallelism: Implements processing on the memory-optimized Apache Spark distributed data processing system to maximize parallel efficiency across multiple computing nodes.
- Dynamic load redistribution: Redistributes imbalanced data loads dynamically within the distributed environment to minimize delays and improve throughput on large datasets.
- Scalability and speed: Achieves increased alignment speed and scalability on large datasets compared with other bisulfite sequencing aligners while maintaining consistent and accurate mapping results.
Scientific Applications:
- Genome-wide DNA methylation mapping: Enables high-resolution DNA methylome analysis from bisulfite sequencing data.
- Large-scale bisulfite sequencing studies: Supports population-scale or high-throughput projects that require distributed alignment of bisulfite-treated reads.
- Epigenetics research: Facilitates detection and interpretation of cytosine methylation patterns derived from sodium bisulfite-treated sequencing.
Methodology:
Leverages the Apache Spark memory-optimized distributed data processing framework to perform bisulfite-aware alignment and dynamically redistribute imbalanced data loads across computing nodes.
Topics
Details
- License:
- Unlicense
- Maturity:
- Mature
- Cost:
- Free of charge
- Tool Type:
- command-line tool
- Operating Systems:
- Linux, Mac
- Programming Languages:
- Python
- Added:
- 8/11/2019
- Last Updated:
- 6/16/2020
Operations
Publications
Soe S, Park Y, Chae H. BiSpark: a Spark-based highly scalable aligner for bisulfite sequencing data. BMC Bioinformatics. 2018;19(1). doi:10.1186/s12859-018-2498-2. PMID:30526492. PMCID:PMC6288881.
PMID: 30526492
PMCID: PMC6288881
Funding: - Ministry of Science ICT and Future Planning: 2017R1C1B5018165
- National Research Foundation of Korea: NRF-2016R1D1A1A02937186
- Sookmyung Women's University: 1-1703-2032
- Korea Health Industry Development Institute: HI15C3224
Documentation
Links
Issue tracker
https://github.com/bhi-kimlab/BiSpark/issues