MUSCLE

MUSCLE produces multiple sequence alignments of protein sequences for comparative sequence analysis, evolutionary studies, and protein structure prediction.


Key Features:

  • Fast distance estimation: Utilizes k-mer counting to rapidly estimate distances between sequences.
  • Progressive alignment scoring: Uses a log-expectation profile score for progressive alignment to improve accuracy.
  • Refinement: Incorporates tree-dependent restricted partitioning during refinement stages to improve alignment quality.
  • Benchmark performance: Tested against T-Coffee, MAFFT, CLUSTALW, Progressive POA, and the MAFFT FFTNS1 script, achieving the highest or joint-highest accuracy on BAliBASE, SABmark, SMART, and PREFAB.
  • Speed: Demonstrates high throughput, e.g., aligning 5,000 sequences of average length 350 in ~7 minutes and MUSCLE-fast aligning 1,000 sequences of average length 282 in ~21 seconds.
  • Variants: Offers MUSCLE (default) for highest accuracy, MUSCLE-fast for high-throughput use, and MUSCLE-prog as a compromise between speed and accuracy.
  • Objective-function evaluation protocol: Implements a protocol for evaluating objective functions when aligning two profiles.
  • Unpublished algorithmic techniques: Includes additional unpublished techniques aimed at improving biological accuracy and computational efficiency.

Scientific Applications:

  • Multiple sequence alignment tasks: Produces precise alignments for comparative sequence analysis and downstream analyses.
  • Genomic studies: Handles large sequence datasets for genomic-scale analyses.
  • Evolutionary biology: Supports evolutionary and phylogenetic analyses through accurate alignments.
  • Protein structure prediction: Provides alignments that aid protein structure prediction and comparative modeling.

Methodology:

Uses k-mer counting for rapid distance estimation, progressive alignment with a log-expectation profile score, tree-dependent restricted partitioning for refinement, and a protocol for evaluating objective functions when aligning two profiles.

Topics

Collections

Details

License:
Other
Maturity:
Mature
Cost:
Free of charge
Tool Type:
api, command-line tool
Operating Systems:
Linux, Windows, Mac
Added:
1/17/2017
Last Updated:
11/24/2024

Operations

Data Inputs & Outputs

Publications

Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research. 2004;32(5):1792-1797. doi:10.1093/nar/gkh340. PMID:15034147. PMCID:PMC390337.

Mareuil F, Doppelt-Azeroual O, Ménager H. A public Galaxy platform at Pasteur used as an execution engine for web services. Unknown Journal. 2017. doi:10.7490/f1000research.1114334.1.

Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004;5(1). doi:10.1186/1471-2105-5-113. PMID:15318951. PMCID:PMC517706.

Edgar RC. High-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny. Unknown Journal. 2021. doi:10.1101/2021.06.20.449169.

Documentation

Downloads

Links