SIBIS
SIBIS identifies inconsistencies and errors in protein sequences by analyzing multiple sequence alignments to improve the accuracy of protein sequence databases.
Key Features:
- Multiple sequence alignment analysis: Leverages evolutionary information in multiple sequence alignments to assess residue consistency across homologs.
- Bayesian Dirichlet mixture modeling: Applies a Bayesian framework integrated with Dirichlet mixture models to model amino-acid substitution patterns.
- Amino-acid probability estimation: Estimates the probability of observing specific amino acids at each alignment position.
- Inconsistency detection: Detects inconsistent or erroneous segments that may result from natural genetic variants or errors introduced during sequence prediction.
- Performance validation: Demonstrated higher sensitivity with minimal loss of specificity against a reference set of protein sequences with experimentally validated errors.
- Large-scale UniProt analysis: Applied to human sequences from UniProt, identifying evidence of inconsistency in 48% of previously uncharacterized sequences.
- Implementation: Implemented in C for Linux computational environments.
Scientific Applications:
- Protein database quality control: Identifies and flags inconsistent sequence segments to improve the accuracy of protein sequence databases.
- Gene prediction validation: Supports evaluation of predicted protein-coding genes and exon structures by detecting prediction-induced sequence errors.
- Structural, functional, and phylogenetic inference: Improves downstream inference of protein structure, function, and phylogeny by filtering erroneous sequences.
- Benchmarking and validation: Serves for benchmarking methods using reference sets of experimentally validated sequence errors.
- Large-scale screening of UniProt: Enables large-scale screening of UniProt human sequences to characterize the prevalence of inconsistencies.
Methodology:
Uses multiple sequence alignments and evolutionary information within a Bayesian framework combined with Dirichlet mixture models to estimate amino-acid probabilities at alignment positions and detect inconsistent segments; implemented in C on Linux.
Topics
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Linux
- Programming Languages:
- C
- Added:
- 8/3/2017
- Last Updated:
- 11/25/2024
Operations
Publications
Khenoussi W, Vanhoutrève R, Poch O, Thompson JD. SIBIS: a Bayesian model for inconsistent protein sequence estimation. Bioinformatics. 2014;30(17):2432-2439. doi:10.1093/bioinformatics/btu329. PMID:24825613.
PMID: 24825613