SNPTools

SNPTools performs integrative single nucleotide polymorphism (SNP) discovery, genotyping, phasing, and imputation from next-generation sequencing (NGS) data to enable accurate variant calling and haplotype inference in population-scale and low-coverage sequencing studies.


Key Features:

  • High-Quality SNP Discovery and Genotyping: Discovers and genotypes SNPs under low-coverage conditions (approximately 5×) with high sensitivity and specificity.
  • Effective Base Depth (EBD): Implements the nonparametric effective base depth statistic to enhance the accuracy of statistical modeling for sequencing data.
  • Variance Ratio Scoring: Employs a variance-based statistic to identify polymorphic loci with high sensitivity and specificity.
  • BAM-Specific Binomial Mixture Modeling (BBMM): Uses a BBMM clustering algorithm to generate robust genotype likelihoods from heterogeneous sequencing data.
  • Advanced Imputation Engine: Refines raw genotype likelihoods into high-quality phased genotypes and haplotypes via imputation.
  • Efficient Data Handling: Optimized for large-scale population studies with I/O- and storage-aware design to improve computing performance on extensive sequencing datasets.

Scientific Applications:

  • Population-scale variant discovery: Enables SNP discovery and genotyping in large cohorts such as the International 1000 Genomes Project.
  • Population genetics studies: Supports analyses of allele frequency, genetic variation, and haplotype structure in populations.
  • Disease association studies: Provides genotypes and phased haplotypes for use in association mapping and related analyses.
  • Evolutionary biology research: Facilitates investigation of evolutionary patterns and polymorphism across populations.
  • Personalized medicine initiatives: Supplies genotype and haplotype data useful for precision medicine and genetic risk assessment.
  • Microarray-comparable genotyping: Achieves genotyping accuracy comparable to SNP microarrays from NGS data.

Methodology:

Computational methods explicitly include effective base depth (EBD), variance ratio scoring, BAM-specific binomial mixture modeling (BBMM) to generate genotype likelihoods, and an imputation engine to produce phased genotypes and haplotypes, with I/O- and storage-aware processing for heterogeneous NGS data.

Topics

Details

Tool Type:
workflow
Operating Systems:
Linux
Programming Languages:
C++
Added:
8/3/2017
Last Updated:
11/25/2024

Operations

Publications

Wang Y, Lu J, Yu J, Gibbs RA, Yu F. An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data. Genome Research. 2013;23(5):833-842. doi:10.1101/gr.146084.112. PMID:23296920. PMCID:PMC3638139.

Documentation

Links