FAMSA
FAMSA aligns large sets of protein sequences using a progressive algorithm with affine gap costs to produce accurate multiple sequence alignments for evolutionary and comparative analyses.
Key Features:
- Progressive Alignment Algorithm: Employs a progressive alignment strategy tailored for large-scale datasets to balance speed and precision.
- Longest Common Subsequence (LCS) Similarity: Computes pairwise similarities using the longest common subsequence (LCS) measure.
- Affine Gap-Cost Evaluation: Implements a novel affine gap-cost evaluation method to handle variable sequence lengths and improve biological relevance of gaps.
- Iterative Refinement Scheme: Applies iterative refinement to progressively improve alignment quality by adjusting sequence positions based on similarity and gap information.
- Optimization and Parallelization: Uses algorithmic optimizations and parallel processing to reduce runtime and memory usage on modern hardware.
- Scalability and Memory Efficiency: Scales to hundreds of thousands of sequences with a low memory footprint (e.g., 415,519 sequences in under two hours using ≤8 GB RAM as reported).
- Alignment Quality Metrics: Delivers competitive sum-of-pairs and total-column scores for large datasets.
Scientific Applications:
- Evolutionary Analysis: Supports phylogenetic and evolutionary studies of protein families via large-scale multiple sequence alignments.
- Functional Annotation: Aids functional annotation by enabling identification of conserved residues and domains across protein sequences.
- Comparative Genomics: Facilitates comparative genomics analyses by aligning proteins across species and large datasets.
- Protein Family Analysis: Enables analysis and characterization of large protein families and database-scale sequence collections.
Methodology:
FAMSA computes pairwise similarities using LCS, constructs a progressive alignment, applies a novel affine gap-cost evaluation and iterative refinement, and incorporates optimizations and parallelization for speed and memory efficiency.
Topics
Details
- License:
- GPL-3.0
- Maturity:
- Mature
- Cost:
- Free of charge
- Tool Type:
- command-line tool
- Operating Systems:
- Linux, Mac
- Programming Languages:
- C++
- Added:
- 1/19/2021
- Last Updated:
- 11/24/2024
Operations
Publications
Deorowicz S, Debudaj-Grabysz A, Gudyś A. FAMSA: Fast and accurate multiple sequence alignment of huge protein families. Scientific Reports. 2016;6(1). doi:10.1038/srep33964. PMID:27670777. PMCID:PMC5037421.
Links
Issue tracker
https://github.com/refresh-bio/FAMSA/issues