DECA

DECA implements a scalable, distributed re-implementation of XHMM to detect copy-number variants (CNVs) from whole-exome sequencing data using ADAM and Apache Spark to accelerate and scale CNV discovery.


Key Features:

  • Scalability: Operates across multi-core shared-memory systems and large distributed clusters using Apache Spark to handle varying dataset sizes.
  • Performance optimization: Incorporates algorithmic optimizations that eliminate unnecessary computations, yielding measured speedups such as CNV discovery from a read-depth matrix in 9.3 minutes on a 16-core workstation (35.3× faster than XHMM) and 12.7 minutes using 10 executor cores on a Spark cluster (18.8× speedup).
  • Large-scale data handling: Processes CNV discovery from original BAM files at cluster scale, demonstrated as 292 minutes using 640 executor cores on a Spark cluster.
  • Algorithmic enhancements: Integrates improvements to optimize computational resource use for genome-wide analyses and high-throughput CNV discovery.
  • Research-focused configuration exploration: Enables exploration of a broader configuration parameter space to fine-tune CNV detection across large cohorts.

Scientific Applications:

  • Whole-exome CNV discovery: Detection of copy-number variants from whole-exome sequencing read-depth matrices and BAM files.
  • Large-cohort genomic studies: High-throughput CNV analysis across large sample cohorts enabled by distributed computation.
  • Genome-wide analyses and parameter optimization: Genome-wide CNV analyses with the ability to explore configuration parameter space for method tuning.

Methodology:

Re-implements the XHMM algorithm using ADAM and Apache Spark with parallelization and algorithmic optimizations to distribute computation across multi-core workstations and Spark clusters, operating on read-depth matrices and original BAM files.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Mac, Linux, Windows
Programming Languages:
Scala, Java
Added:
12/17/2020
Last Updated:
12/17/2020

Operations

Publications

Linderman MD, Chia D, Wallace F, Nothaft FA. DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark. BMC Bioinformatics. 2019;20(1). doi:10.1186/s12859-019-3108-7. PMID:31604420. PMCID:PMC6787990.

PMID: 31604420
PMCID: PMC6787990
Funding: - National Science Foundation: CCF-1139158 - Lawrence Berkeley National Laboratory: 7076018 - Defense Advanced Research Projects Agency: FA8750-12-2-0331 - National Human Genome Research Institute: U54HG007990-01 - National Institutes of Health: HHSN261201400006C