HadoopCNV
HadoopCNV identifies copy number variations (CNVs) from whole-genome sequencing (WGS) data using allele-specific read depth and allelic frequency to support scalable, high-throughput CNV discovery.
Key Features:
- Dynamic Programming Imputation (DPI) algorithm: Employs a DPI-based algorithm that leverages allelic frequency and read depth to infer copy number changes.
- Allele-specific read depth: Utilizes high-resolution, allele-specific read depth from WGS to improve CNV and loss of heterozygosity inference.
- Scalability (Hadoop): Built on the Hadoop framework to distribute computation across multiple compute nodes with a reported linear relationship between speed improvement and number of nodes.
- Performance benchmarking: Demonstrated similar or superior performance to CNVnator and LUMPY on simulated datasets and NA12878, with high Mendelian precision in a 10-member pedigree analysis.
- Loss of Heterozygosity (LOH) detection: Capable of inferring LOH in addition to copy number changes.
- Efficiency: Reported runtime of approximately 1.6 hours to analyze a human genome at 30X coverage on a 32-node cluster.
- Integration with other callers: Can integrate with tools such as LUMPY to enhance SV/CNV calling performance.
Scientific Applications:
- Population genomics: Enables large-scale CNV discovery from WGS data for population-level variant analyses.
- Pedigree analysis: Supports Mendelian precision assessment and CNV detection in family studies, demonstrated in a 10-member pedigree.
- Clinical diagnostics: Applicable to clinical WGS workflows for detecting CNVs relevant to genetic diagnosis at cohort scale.
- Cancer genomics: Facilitates detection of CNVs and LOH to characterize genetic heterogeneity in tumor genomes.
Methodology:
Implements a Dynamic Programming Imputation algorithm using allelic frequency and allele-specific read depth from WGS and performs distributed computation via the Hadoop framework; it can integrate outputs with LUMPY.
Topics
Collections
Details
- License:
- MIT
- Tool Type:
- command-line tool
- Operating Systems:
- Linux
- Programming Languages:
- Java
- Added:
- 8/20/2017
- Last Updated:
- 9/4/2019
Operations
Publications
Yang H, Chen G, Lima L, Fang H, Jimenez L, Li M, Lyon GJ, He M, Wang K. HadoopCNV: A dynamic programming imputation algorithm to detect copy number variants from sequencing data. Unknown Journal. 2017. doi:10.1101/124339.
DOI: 10.1101/124339