ShoRAH
ShoRAH reconstructs haplotypes and detects low-frequency variants from next-generation sequencing data to characterize genetic diversity in mixed samples while accounting for sequencing errors.
Key Features:
- Error Correction: Clusters reads in small windows to perform error correction and reduce base substitution rates in pyrosequencing HIV-1 data (e.g., from 0.25% to 0.05% and from 0.05% to 0.03%).
- Probabilistic Bayesian Approach: Applies a probabilistic Bayesian framework to distinguish true genetic variants from sequencing errors, improving precision and recall relative to counting-based methods.
- Haplotype Inference: Reconstructs global haplotypes by assigning observed reads to unobserved haplotypes using a generative probabilistic model with a Dirichlet process mixture prior and a Gibbs sampler for local haplotype estimation.
- Strand Bias Testing: Integrates a beta-binomial strand bias test on forward read distributions to reduce false-positive single nucleotide variants (SNVs).
- Application to Viral Populations: Targets genetically diverse viral populations such as HIV and hepatitis C virus to identify low-frequency variants and investigate phenotypic drug resistance and epidemiological links.
Scientific Applications:
- Viral population analysis: Inference of haplotypes and variant spectra from deep sequencing of HIV and hepatitis C virus samples.
- Drug resistance characterization: Detection of minority variants relevant for phenotypic drug resistance analysis.
- Epidemiological linkage: Reconstruction of haplotypes to support analysis of epidemiological links between samples.
- Population genetics and evolution: Study of viral evolution and population genetic structure in heterogeneous sequencing data.
Methodology:
Methods explicitly include read clustering in small windows for error correction, statistical modeling and Bayesian inference, a generative probabilistic model with a Dirichlet process mixture prior, Gibbs sampling for local haplotype estimation, and a beta-binomial strand bias test for SNV filtering.
Topics
Details
- License:
- GPL-3.0
- Maturity:
- Mature
- Cost:
- Free of charge
- Tool Type:
- command-line tool
- Operating Systems:
- Linux, Mac
- Programming Languages:
- C++, Python
- Added:
- 1/13/2017
- Last Updated:
- 11/25/2024
Operations
Data Inputs & Outputs
Haplotype mapping
Publications
Zagordi O, Bhattacharya A, Eriksson N, Beerenwinkel N. ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data. BMC Bioinformatics. 2011;12(1). doi:10.1186/1471-2105-12-119. PMID:21521499. PMCID:PMC3113935.
Zagordi O, Klein R, Däumer M, Beerenwinkel N. Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies. Nucleic Acids Research. 2010;38(21):7400-7409. doi:10.1093/nar/gkq655. PMID:20671025. PMCID:PMC2995073.
Zagordi O, Geyrhofer L, Roth V, Beerenwinkel N. Deep Sequencing of a Genetically Heterogeneous Sample: Local Haplotype Reconstruction and Read Error Correction. Journal of Computational Biology. 2010;17(3):417-428. doi:10.1089/cmb.2009.0164. PMID:20377454.
McElroy K, Zagordi O, Bull R, Luciani F, Beerenwinkel N. Accurate single nucleotide variant detection in viral populations by combining probabilistic clustering with a statistical test of strand bias. BMC Genomics. 2013;14(1):501. doi:10.1186/1471-2164-14-501. PMID:23879730. PMCID:PMC3848937.
Documentation
Downloads
- Software packagehttps://bioconda.github.io/recipes/shorah/README.htmlconda install shorah
- Source codeVersion: 1.1.3https://github.com/cbg-ethz/shorah/releases/download/v1.1.3/shorah-1.1.3.tar.bz2
- Source codeVersion: 1.99.0https://github.com/cbg-ethz/shorah/releases/download/v1.99.0/shorah-1.99.0.tar.bz2