SNPTools
SNPTools performs integrative single nucleotide polymorphism (SNP) discovery, genotyping, phasing, and imputation from next-generation sequencing (NGS) data to enable accurate variant calling and haplotype inference in population-scale and low-coverage sequencing studies.
Key Features:
- High-Quality SNP Discovery and Genotyping: Discovers and genotypes SNPs under low-coverage conditions (approximately 5×) with high sensitivity and specificity.
- Effective Base Depth (EBD): Implements the nonparametric effective base depth statistic to enhance the accuracy of statistical modeling for sequencing data.
- Variance Ratio Scoring: Employs a variance-based statistic to identify polymorphic loci with high sensitivity and specificity.
- BAM-Specific Binomial Mixture Modeling (BBMM): Uses a BBMM clustering algorithm to generate robust genotype likelihoods from heterogeneous sequencing data.
- Advanced Imputation Engine: Refines raw genotype likelihoods into high-quality phased genotypes and haplotypes via imputation.
- Efficient Data Handling: Optimized for large-scale population studies with I/O- and storage-aware design to improve computing performance on extensive sequencing datasets.
Scientific Applications:
- Population-scale variant discovery: Enables SNP discovery and genotyping in large cohorts such as the International 1000 Genomes Project.
- Population genetics studies: Supports analyses of allele frequency, genetic variation, and haplotype structure in populations.
- Disease association studies: Provides genotypes and phased haplotypes for use in association mapping and related analyses.
- Evolutionary biology research: Facilitates investigation of evolutionary patterns and polymorphism across populations.
- Personalized medicine initiatives: Supplies genotype and haplotype data useful for precision medicine and genetic risk assessment.
- Microarray-comparable genotyping: Achieves genotyping accuracy comparable to SNP microarrays from NGS data.
Methodology:
Computational methods explicitly include effective base depth (EBD), variance ratio scoring, BAM-specific binomial mixture modeling (BBMM) to generate genotype likelihoods, and an imputation engine to produce phased genotypes and haplotypes, with I/O- and storage-aware processing for heterogeneous NGS data.
Topics
Details
- Tool Type:
- workflow
- Operating Systems:
- Linux
- Programming Languages:
- C++
- Added:
- 8/3/2017
- Last Updated:
- 11/25/2024
Operations
Publications
Wang Y, Lu J, Yu J, Gibbs RA, Yu F. An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data. Genome Research. 2013;23(5):833-842. doi:10.1101/gr.146084.112. PMID:23296920. PMCID:PMC3638139.