CNVkit

CNVkit infers and visualizes copy number variants (CNVs) and somatic copy number alterations (SCNAs) from targeted DNA sequencing data to provide genome-wide and exon-level copy number profiles for genomic analysis.


Key Features:

  • Data sources: Uses both targeted reads and nonspecifically captured off-target reads from targeted DNA sequencing and massively parallel sequencing.
  • Resolution within targets: Provides exon-level copy number resolution inside targeted regions.
  • Genome-wide resolution: Achieves approximately 100-kilobase resolution genome-wide from platforms targeting as few as 293 genes, with signal in intronic and intergenic areas via off-target reads.
  • Normalization: Normalizes read counts against a pooled reference to reduce sample-to-sample variability.
  • Bias correction: Corrects read-depth biases associated with GC content, target footprint size and spacing, and repetitive sequences.
  • Bias sources addressed: Accounts for variability introduced by target capture efficiency and library preparation that affect read depth.
  • Read-depth analysis: Operates on sequencing read counts/read depth as the primary signal for copy number inference.
  • Visualization and reporting: Produces visualizations and reports of identified copy number changes and significant features.
  • Benchmarking: Has been evaluated against array comparative genomic hybridization (aCGH) for performance assessment.

Scientific Applications:

  • Germline CNV detection: Identification of germline copy number variants relevant to syndromic conditions and genetic studies.
  • Somatic CNV/SCNA analysis in cancer: Detection and characterization of somatic copy number alterations in cancer genomes from targeted sequencing data.
  • Targeted sequencing studies: Generation of high-resolution copy number data from targeted re-sequencing efforts to support genomic research into genetic contributions to disease.
  • Cross-platform comparison: Use in studies comparing sequencing-based CNV calls to array-based methods such as aCGH.

Methodology:

Leverages targeted and off-target reads, normalizes read counts against a pooled reference, corrects for GC content, target footprint size and spacing, and repetitive sequences, and infers copy number at exon-level within targets and at ~100-kilobase resolution genome-wide.

Topics

Details

License:
BSD-3-Clause
Maturity:
Mature
Tool Type:
library
Operating Systems:
Mac
Programming Languages:
Python
Added:
1/13/2017
Last Updated:
11/25/2024

Operations

Publications

Talevich E, Shain AH, Botton T, Bastian BC. CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing. PLOS Computational Biology. 2016;12(4):e1004873. doi:10.1371/journal.pcbi.1004873. PMID:27100738. PMCID:PMC4839673.

PMID: 27100738
PMCID: PMC4839673
Funding: - National Institutes of Health: P01 CA025874, R01 CA131524

Documentation