HadoopCNV

HadoopCNV identifies copy number variations (CNVs) from whole-genome sequencing (WGS) data using allele-specific read depth and allelic frequency to support scalable, high-throughput CNV discovery.


Key Features:

  • Dynamic Programming Imputation (DPI) algorithm: Employs a DPI-based algorithm that leverages allelic frequency and read depth to infer copy number changes.
  • Allele-specific read depth: Utilizes high-resolution, allele-specific read depth from WGS to improve CNV and loss of heterozygosity inference.
  • Scalability (Hadoop): Built on the Hadoop framework to distribute computation across multiple compute nodes with a reported linear relationship between speed improvement and number of nodes.
  • Performance benchmarking: Demonstrated similar or superior performance to CNVnator and LUMPY on simulated datasets and NA12878, with high Mendelian precision in a 10-member pedigree analysis.
  • Loss of Heterozygosity (LOH) detection: Capable of inferring LOH in addition to copy number changes.
  • Efficiency: Reported runtime of approximately 1.6 hours to analyze a human genome at 30X coverage on a 32-node cluster.
  • Integration with other callers: Can integrate with tools such as LUMPY to enhance SV/CNV calling performance.

Scientific Applications:

  • Population genomics: Enables large-scale CNV discovery from WGS data for population-level variant analyses.
  • Pedigree analysis: Supports Mendelian precision assessment and CNV detection in family studies, demonstrated in a 10-member pedigree.
  • Clinical diagnostics: Applicable to clinical WGS workflows for detecting CNVs relevant to genetic diagnosis at cohort scale.
  • Cancer genomics: Facilitates detection of CNVs and LOH to characterize genetic heterogeneity in tumor genomes.

Methodology:

Implements a Dynamic Programming Imputation algorithm using allelic frequency and allele-specific read depth from WGS and performs distributed computation via the Hadoop framework; it can integrate outputs with LUMPY.

Topics

Collections

Details

License:
MIT
Tool Type:
command-line tool
Operating Systems:
Linux
Programming Languages:
Java
Added:
8/20/2017
Last Updated:
9/4/2019

Operations

Publications

Yang H, Chen G, Lima L, Fang H, Jimenez L, Li M, Lyon GJ, He M, Wang K. HadoopCNV: A dynamic programming imputation algorithm to detect copy number variants from sequencing data. Unknown Journal. 2017. doi:10.1101/124339.

Documentation