MAFFT

MAFFT performs multiple sequence alignment (MSA) for DNA and protein sequences, providing scalable and accurate alignment methods for analyses of large sequencing datasets.


Key Features:

  • H-INS-i, F-INS-i, G-INS-i: Iterative refinement methods that incorporate pairwise alignment information into their objective functions.
  • Iterative refinement and progressive techniques: Multiple algorithmic modes support both progressive alignment and iterative refinement approaches.
  • G-large-INS-1: A scalable variant of G-INS-1 that retains high accuracy while being applicable to datasets containing 50,000 or more sequences.
  • Large dataset handling: Capable of aligning from hundreds to tens of thousands of sequences on typical computational resources.
  • Add unaligned sequences: Ability to add unaligned sequences into an existing alignment.
  • Nucleotide direction adjustment: Direction adjustment features specific for DNA alignment.
  • Constrained alignments: Support for constrained alignment modes to enforce user-specified relationships.
  • Parallel processing: Options to utilize parallel processing to accelerate alignment computations.
  • Dot plot: Dot plot feature available for DNA alignment visualization and analysis.
  • Benchmark accuracy: Iterative refinement options have demonstrated superior accuracy compared to TCoffee and CLUSTAL W in benchmark tests involving alignments of more than 50 sequences.

Scientific Applications:

  • Multiple sequence alignment generation: Production of MSAs for DNA and protein sequences across small to very large datasets.
  • Large-scale sequencing analyses: Alignment of extensive sequencing datasets including thousands to 50,000+ sequences for downstream analyses.
  • Incremental alignment workflows: Incorporation of new unaligned sequences into existing alignments for iterative dataset expansion.
  • DNA-specific analyses: Use of dot plots and nucleotide direction adjustment for DNA alignment inspection and correction.
  • Accuracy-sensitive studies: Application of pairwise-informed iterative refinement methods where benchmarked alignment accuracy is critical.

Methodology:

MAFFT implements progressive alignment and iterative refinement techniques (H-INS-i, F-INS-i, G-INS-i) that incorporate pairwise alignment information into objective functions, provides a scalable G-large-INS-1 variant, supports adding unaligned sequences, constrained alignments, nucleotide direction adjustment, dot-plot analysis, and parallel processing.

Topics

Collections

Details

License:
BSD-Source-Code
Maturity:
Mature
Cost:
Free of charge
Tool Type:
command-line tool, web application
Operating Systems:
Linux, Windows, Mac
Programming Languages:
Python
Added:
12/6/2017
Last Updated:
11/24/2024

Operations

Publications

Katoh K, Standley DM. MAFFT: Iterative Refinement and Additional Methods. Methods in Molecular Biology. 2013. doi:10.1007/978-1-62703-646-7_8. PMID:24170399.

Mareuil F, Doppelt-Azeroual O, Ménager H. A public Galaxy platform at Pasteur used as an execution engine for web services. Unknown Journal. 2017. doi:10.7490/f1000research.1114334.1.

Nakamura T, Yamada KD, Tomii K, Katoh K. Parallelization of MAFFT for large-scale multiple sequence alignments. Bioinformatics. 2018;34(14):2490-2492. doi:10.1093/bioinformatics/bty121. PMID:29506019. PMCID:PMC6041967.

PMID: 29506019
PMCID: PMC6041967
Funding: - KAKENHI: JP16K07464, JP17J06457 - Platform Project for Supporting Drug Discovery and Life Science Research: JP17am0101108, JP17am0101110

Katoh K, Rozewicki J, Yamada KD. MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Briefings in Bioinformatics. 2017;20(4):1160-1166. doi:10.1093/bib/bbx108. PMID:28968734. PMCID:PMC6781576.

PMID: 28968734
PMCID: PMC6781576
Funding: - Japan Society for the Promotion of Science: JP16K07464

Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Molecular Biology and Evolution. 2013;30(4):772-780. doi:10.1093/molbev/mst010. PMID:23329690. PMCID:PMC3603318.

Katoh K. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Research. 2005;33(2):511-518. doi:10.1093/nar/gki198. PMID:15661851. PMCID:PMC548345.

Documentation

Downloads

Links