GISMO

GISMO performs protein multiple sequence alignment using Bayesian Markov chain Monte Carlo sampling to produce statistically justified alignments and infer position-specific gap penalties for large, diverse protein sequence sets.


Key Features:

  • Top-Down Strategy: Employs a top-down strategy with favorable asymptotic time complexity that first identifies regions shared by all input sequences before realigning closely related subgroups.
  • Position-Specific Gap Penalties: Infers position-specific gap penalties that favor insertions or deletions at positions where indels occur in other sequences, placing gaps between conserved blocks to preserve protein structural cores.
  • Bayesian Statistical Measure of Alignment Quality: Uses a Bayesian statistical measure based on the minimum description length principle and Dirichlet mixture priors to assess alignment quality instead of sum-of-pairs scoring, aligning regions only when statistically justified.
  • Exploration of Alignment Space: Defines a system for exploring alignment space and enables development of new sampling strategies to escape suboptimal alignment traps.
  • Bayesian MCMC Sampler: Implements Markov chain Monte Carlo sampling for posterior exploration of alignment hypotheses.
  • Benchmarking: Demonstrated improved accuracy on large, diverse protein sets relative to MUSCLE, MAFFT, Clustal-Ω, and Kalign.

Scientific Applications:

  • Conserved Domain Alignment: Identifying and aligning conserved domains within large, diverse sets of full-length protein sequences.
  • Validation Dataset: Validated using 408 protein sets from the NCBI Conserved Domain Database that were manually curated based on available crystal structures.
  • Structural and Functional Interpretation: Producing alignments suitable for interpreting protein structure and function.

Methodology:

Uses Bayesian inference with Markov chain Monte Carlo sampling, the minimum description length principle, Dirichlet mixture priors, a top-down alignment strategy, inference of position-specific gap penalties, and an explicit system for exploring alignment space.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux
Programming Languages:
C++
Added:
8/3/2017
Last Updated:
11/25/2024

Operations

Publications

Neuwald AF, Altschul SF. Bayesian Top-Down Protein Sequence Alignment with Inferred Position-Specific Gap Penalties. PLOS Computational Biology. 2016;12(5):e1004936. doi:10.1371/journal.pcbi.1004936. PMID:27192614. PMCID:PMC4871425.

Documentation

Links