Atlas-SNP2
Atlas-SNP2 detects single nucleotide polymorphisms (SNPs) in next-generation sequencing (NGS) data by distinguishing true variants from sequencing errors.
Key Features:
- Error Probability Integration: Uses a logistic regression model trained on datasets to identify systematic sequencing errors influenced by context-related variables and to estimate error probabilities associated with base calls.
- Bayesian Estimation: Applies a Bayesian formula that integrates priors on overall sequencing error rates and estimated SNP frequencies with logistic regression outputs to estimate the posterior error probability of each substitution.
- Posterior SNP Probability Calculation: Calculates posterior SNP probabilities for substitutions to distinguish true SNPs from sequencing errors.
- Validation Metrics: Validation reports a false-positive rate below 10% and a false-negative rate of approximately 5% or lower.
Scientific Applications:
- Population-scale variant discovery: Supports accurate SNP detection in large-scale genomic projects such as the 1000 Genomes Project using NGS data.
- Improved variant calling in NGS studies: Reduces false-positive and false-negative variant calls in comprehensive genetic analyses based on next-generation sequencing.
Methodology:
Implements logistic regression trained on datasets to model context-dependent sequencing error probabilities for base calls, uses a Bayesian formula that combines priors on sequencing error rates and SNP frequencies with logistic regression results to estimate posterior error probabilities per substitution, and computes posterior SNP probabilities to classify variants.
Topics
Details
- Maturity:
- Mature
- Tool Type:
- command-line tool
- Operating Systems:
- Linux
- Programming Languages:
- Ruby, C++
- Added:
- 1/13/2017
- Last Updated:
- 11/25/2024
Operations
Publications
Shen Y, Wan Z, Coarfa C, Drabek R, Chen L, Ostrowski EA, Liu Y, Weinstock GM, Wheeler DA, Gibbs RA, Yu F. A SNP discovery method to assess variant allele probability from next-generation resequencing data. Genome Research. 2009;20(2):273-280. doi:10.1101/gr.096388.109. PMID:20019143. PMCID:PMC2813483.