DECA
DECA implements a scalable, distributed re-implementation of XHMM to detect copy-number variants (CNVs) from whole-exome sequencing data using ADAM and Apache Spark to accelerate and scale CNV discovery.
Key Features:
- Scalability: Operates across multi-core shared-memory systems and large distributed clusters using Apache Spark to handle varying dataset sizes.
- Performance optimization: Incorporates algorithmic optimizations that eliminate unnecessary computations, yielding measured speedups such as CNV discovery from a read-depth matrix in 9.3 minutes on a 16-core workstation (35.3× faster than XHMM) and 12.7 minutes using 10 executor cores on a Spark cluster (18.8× speedup).
- Large-scale data handling: Processes CNV discovery from original BAM files at cluster scale, demonstrated as 292 minutes using 640 executor cores on a Spark cluster.
- Algorithmic enhancements: Integrates improvements to optimize computational resource use for genome-wide analyses and high-throughput CNV discovery.
- Research-focused configuration exploration: Enables exploration of a broader configuration parameter space to fine-tune CNV detection across large cohorts.
Scientific Applications:
- Whole-exome CNV discovery: Detection of copy-number variants from whole-exome sequencing read-depth matrices and BAM files.
- Large-cohort genomic studies: High-throughput CNV analysis across large sample cohorts enabled by distributed computation.
- Genome-wide analyses and parameter optimization: Genome-wide CNV analyses with the ability to explore configuration parameter space for method tuning.
Methodology:
Re-implements the XHMM algorithm using ADAM and Apache Spark with parallelization and algorithmic optimizations to distribute computation across multi-core workstations and Spark clusters, operating on read-depth matrices and original BAM files.
Topics
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Mac, Linux, Windows
- Programming Languages:
- Scala, Java
- Added:
- 12/17/2020
- Last Updated:
- 12/17/2020
Operations
Publications
Linderman MD, Chia D, Wallace F, Nothaft FA. DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark. BMC Bioinformatics. 2019;20(1). doi:10.1186/s12859-019-3108-7. PMID:31604420. PMCID:PMC6787990.
PMID: 31604420
PMCID: PMC6787990
Funding: - National Science Foundation: CCF-1139158
- Lawrence Berkeley National Laboratory: 7076018
- Defense Advanced Research Projects Agency: FA8750-12-2-0331
- National Human Genome Research Institute: U54HG007990-01
- National Institutes of Health: HHSN261201400006C