GISMO
GISMO performs protein multiple sequence alignment using Bayesian Markov chain Monte Carlo sampling to produce statistically justified alignments and infer position-specific gap penalties for large, diverse protein sequence sets.
Key Features:
- Top-Down Strategy: Employs a top-down strategy with favorable asymptotic time complexity that first identifies regions shared by all input sequences before realigning closely related subgroups.
- Position-Specific Gap Penalties: Infers position-specific gap penalties that favor insertions or deletions at positions where indels occur in other sequences, placing gaps between conserved blocks to preserve protein structural cores.
- Bayesian Statistical Measure of Alignment Quality: Uses a Bayesian statistical measure based on the minimum description length principle and Dirichlet mixture priors to assess alignment quality instead of sum-of-pairs scoring, aligning regions only when statistically justified.
- Exploration of Alignment Space: Defines a system for exploring alignment space and enables development of new sampling strategies to escape suboptimal alignment traps.
- Bayesian MCMC Sampler: Implements Markov chain Monte Carlo sampling for posterior exploration of alignment hypotheses.
- Benchmarking: Demonstrated improved accuracy on large, diverse protein sets relative to MUSCLE, MAFFT, Clustal-Ω, and Kalign.
Scientific Applications:
- Conserved Domain Alignment: Identifying and aligning conserved domains within large, diverse sets of full-length protein sequences.
- Validation Dataset: Validated using 408 protein sets from the NCBI Conserved Domain Database that were manually curated based on available crystal structures.
- Structural and Functional Interpretation: Producing alignments suitable for interpreting protein structure and function.
Methodology:
Uses Bayesian inference with Markov chain Monte Carlo sampling, the minimum description length principle, Dirichlet mixture priors, a top-down alignment strategy, inference of position-specific gap penalties, and an explicit system for exploring alignment space.
Topics
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Linux
- Programming Languages:
- C++
- Added:
- 8/3/2017
- Last Updated:
- 11/25/2024
Operations
Publications
Neuwald AF, Altschul SF. Bayesian Top-Down Protein Sequence Alignment with Inferred Position-Specific Gap Penalties. PLOS Computational Biology. 2016;12(5):e1004936. doi:10.1371/journal.pcbi.1004936. PMID:27192614. PMCID:PMC4871425.