FAMSA

FAMSA aligns large sets of protein sequences using a progressive algorithm with affine gap costs to produce accurate multiple sequence alignments for evolutionary and comparative analyses.


Key Features:

  • Progressive Alignment Algorithm: Employs a progressive alignment strategy tailored for large-scale datasets to balance speed and precision.
  • Longest Common Subsequence (LCS) Similarity: Computes pairwise similarities using the longest common subsequence (LCS) measure.
  • Affine Gap-Cost Evaluation: Implements a novel affine gap-cost evaluation method to handle variable sequence lengths and improve biological relevance of gaps.
  • Iterative Refinement Scheme: Applies iterative refinement to progressively improve alignment quality by adjusting sequence positions based on similarity and gap information.
  • Optimization and Parallelization: Uses algorithmic optimizations and parallel processing to reduce runtime and memory usage on modern hardware.
  • Scalability and Memory Efficiency: Scales to hundreds of thousands of sequences with a low memory footprint (e.g., 415,519 sequences in under two hours using ≤8 GB RAM as reported).
  • Alignment Quality Metrics: Delivers competitive sum-of-pairs and total-column scores for large datasets.

Scientific Applications:

  • Evolutionary Analysis: Supports phylogenetic and evolutionary studies of protein families via large-scale multiple sequence alignments.
  • Functional Annotation: Aids functional annotation by enabling identification of conserved residues and domains across protein sequences.
  • Comparative Genomics: Facilitates comparative genomics analyses by aligning proteins across species and large datasets.
  • Protein Family Analysis: Enables analysis and characterization of large protein families and database-scale sequence collections.

Methodology:

FAMSA computes pairwise similarities using LCS, constructs a progressive alignment, applies a novel affine gap-cost evaluation and iterative refinement, and incorporates optimizations and parallelization for speed and memory efficiency.

Topics

Details

License:
GPL-3.0
Maturity:
Mature
Cost:
Free of charge
Tool Type:
command-line tool
Operating Systems:
Linux, Mac
Programming Languages:
C++
Added:
1/19/2021
Last Updated:
11/24/2024

Operations

Publications

Deorowicz S, Debudaj-Grabysz A, Gudyś A. FAMSA: Fast and accurate multiple sequence alignment of huge protein families. Scientific Reports. 2016;6(1). doi:10.1038/srep33964. PMID:27670777. PMCID:PMC5037421.

Links