ShoRAH

ShoRAH reconstructs haplotypes and detects low-frequency variants from next-generation sequencing data to characterize genetic diversity in mixed samples while accounting for sequencing errors.


Key Features:

  • Error Correction: Clusters reads in small windows to perform error correction and reduce base substitution rates in pyrosequencing HIV-1 data (e.g., from 0.25% to 0.05% and from 0.05% to 0.03%).
  • Probabilistic Bayesian Approach: Applies a probabilistic Bayesian framework to distinguish true genetic variants from sequencing errors, improving precision and recall relative to counting-based methods.
  • Haplotype Inference: Reconstructs global haplotypes by assigning observed reads to unobserved haplotypes using a generative probabilistic model with a Dirichlet process mixture prior and a Gibbs sampler for local haplotype estimation.
  • Strand Bias Testing: Integrates a beta-binomial strand bias test on forward read distributions to reduce false-positive single nucleotide variants (SNVs).
  • Application to Viral Populations: Targets genetically diverse viral populations such as HIV and hepatitis C virus to identify low-frequency variants and investigate phenotypic drug resistance and epidemiological links.

Scientific Applications:

  • Viral population analysis: Inference of haplotypes and variant spectra from deep sequencing of HIV and hepatitis C virus samples.
  • Drug resistance characterization: Detection of minority variants relevant for phenotypic drug resistance analysis.
  • Epidemiological linkage: Reconstruction of haplotypes to support analysis of epidemiological links between samples.
  • Population genetics and evolution: Study of viral evolution and population genetic structure in heterogeneous sequencing data.

Methodology:

Methods explicitly include read clustering in small windows for error correction, statistical modeling and Bayesian inference, a generative probabilistic model with a Dirichlet process mixture prior, Gibbs sampling for local haplotype estimation, and a beta-binomial strand bias test for SNV filtering.

Topics

Details

License:
GPL-3.0
Maturity:
Mature
Cost:
Free of charge
Tool Type:
command-line tool
Operating Systems:
Linux, Mac
Programming Languages:
C++, Python
Added:
1/13/2017
Last Updated:
11/25/2024

Operations

Data Inputs & Outputs

Publications

Zagordi O, Bhattacharya A, Eriksson N, Beerenwinkel N. ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data. BMC Bioinformatics. 2011;12(1). doi:10.1186/1471-2105-12-119. PMID:21521499. PMCID:PMC3113935.

Zagordi O, Klein R, Däumer M, Beerenwinkel N. Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies. Nucleic Acids Research. 2010;38(21):7400-7409. doi:10.1093/nar/gkq655. PMID:20671025. PMCID:PMC2995073.

Zagordi O, Geyrhofer L, Roth V, Beerenwinkel N. Deep Sequencing of a Genetically Heterogeneous Sample: Local Haplotype Reconstruction and Read Error Correction. Journal of Computational Biology. 2010;17(3):417-428. doi:10.1089/cmb.2009.0164. PMID:20377454.

McElroy K, Zagordi O, Bull R, Luciani F, Beerenwinkel N. Accurate single nucleotide variant detection in viral populations by combining probabilistic clustering with a statistical test of strand bias. BMC Genomics. 2013;14(1):501. doi:10.1186/1471-2164-14-501. PMID:23879730. PMCID:PMC3848937.

Documentation

Downloads

Links

Repository
https://github.com/cbg-ethz/shorah
(GitHub repository)
Issue tracker
https://github.com/cbg-ethz/shorah/issues
(GitHub issue tracker)
Software catalogue
https://www.expasy.org/resources/search/querytext:shorah
(ExPASy - SIB Bioinformatics Resources Portal)

Related Tools

v-pipe
Relation: usedBy
biopython
Relation: uses
samtools
Relation: uses