SIBIS

SIBIS identifies inconsistencies and errors in protein sequences by analyzing multiple sequence alignments to improve the accuracy of protein sequence databases.


Key Features:

  • Multiple sequence alignment analysis: Leverages evolutionary information in multiple sequence alignments to assess residue consistency across homologs.
  • Bayesian Dirichlet mixture modeling: Applies a Bayesian framework integrated with Dirichlet mixture models to model amino-acid substitution patterns.
  • Amino-acid probability estimation: Estimates the probability of observing specific amino acids at each alignment position.
  • Inconsistency detection: Detects inconsistent or erroneous segments that may result from natural genetic variants or errors introduced during sequence prediction.
  • Performance validation: Demonstrated higher sensitivity with minimal loss of specificity against a reference set of protein sequences with experimentally validated errors.
  • Large-scale UniProt analysis: Applied to human sequences from UniProt, identifying evidence of inconsistency in 48% of previously uncharacterized sequences.
  • Implementation: Implemented in C for Linux computational environments.

Scientific Applications:

  • Protein database quality control: Identifies and flags inconsistent sequence segments to improve the accuracy of protein sequence databases.
  • Gene prediction validation: Supports evaluation of predicted protein-coding genes and exon structures by detecting prediction-induced sequence errors.
  • Structural, functional, and phylogenetic inference: Improves downstream inference of protein structure, function, and phylogeny by filtering erroneous sequences.
  • Benchmarking and validation: Serves for benchmarking methods using reference sets of experimentally validated sequence errors.
  • Large-scale screening of UniProt: Enables large-scale screening of UniProt human sequences to characterize the prevalence of inconsistencies.

Methodology:

Uses multiple sequence alignments and evolutionary information within a Bayesian framework combined with Dirichlet mixture models to estimate amino-acid probabilities at alignment positions and detect inconsistent segments; implemented in C on Linux.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux
Programming Languages:
C
Added:
8/3/2017
Last Updated:
11/25/2024

Operations

Publications

Khenoussi W, Vanhoutrève R, Poch O, Thompson JD. SIBIS: a Bayesian model for inconsistent protein sequence estimation. Bioinformatics. 2014;30(17):2432-2439. doi:10.1093/bioinformatics/btu329. PMID:24825613.

Documentation

Links