MAFFT
MAFFT performs multiple sequence alignment (MSA) for DNA and protein sequences, providing scalable and accurate alignment methods for analyses of large sequencing datasets.
Key Features:
- H-INS-i, F-INS-i, G-INS-i: Iterative refinement methods that incorporate pairwise alignment information into their objective functions.
- Iterative refinement and progressive techniques: Multiple algorithmic modes support both progressive alignment and iterative refinement approaches.
- G-large-INS-1: A scalable variant of G-INS-1 that retains high accuracy while being applicable to datasets containing 50,000 or more sequences.
- Large dataset handling: Capable of aligning from hundreds to tens of thousands of sequences on typical computational resources.
- Add unaligned sequences: Ability to add unaligned sequences into an existing alignment.
- Nucleotide direction adjustment: Direction adjustment features specific for DNA alignment.
- Constrained alignments: Support for constrained alignment modes to enforce user-specified relationships.
- Parallel processing: Options to utilize parallel processing to accelerate alignment computations.
- Dot plot: Dot plot feature available for DNA alignment visualization and analysis.
- Benchmark accuracy: Iterative refinement options have demonstrated superior accuracy compared to TCoffee and CLUSTAL W in benchmark tests involving alignments of more than 50 sequences.
Scientific Applications:
- Multiple sequence alignment generation: Production of MSAs for DNA and protein sequences across small to very large datasets.
- Large-scale sequencing analyses: Alignment of extensive sequencing datasets including thousands to 50,000+ sequences for downstream analyses.
- Incremental alignment workflows: Incorporation of new unaligned sequences into existing alignments for iterative dataset expansion.
- DNA-specific analyses: Use of dot plots and nucleotide direction adjustment for DNA alignment inspection and correction.
- Accuracy-sensitive studies: Application of pairwise-informed iterative refinement methods where benchmarked alignment accuracy is critical.
Methodology:
MAFFT implements progressive alignment and iterative refinement techniques (H-INS-i, F-INS-i, G-INS-i) that incorporate pairwise alignment information into objective functions, provides a scalable G-large-INS-1 variant, supports adding unaligned sequences, constrained alignments, nucleotide direction adjustment, dot-plot analysis, and parallel processing.
Topics
Collections
Details
- License:
- BSD-Source-Code
- Maturity:
- Mature
- Cost:
- Free of charge
- Tool Type:
- command-line tool, web application
- Operating Systems:
- Linux, Windows, Mac
- Programming Languages:
- Python
- Added:
- 12/6/2017
- Last Updated:
- 11/24/2024
Operations
Publications
Katoh K, Standley DM. MAFFT: Iterative Refinement and Additional Methods. Methods in Molecular Biology. 2013. doi:10.1007/978-1-62703-646-7_8. PMID:24170399.
Mareuil F, Doppelt-Azeroual O, Ménager H. A public Galaxy platform at Pasteur used as an execution engine for web services. Unknown Journal. 2017. doi:10.7490/f1000research.1114334.1.
Nakamura T, Yamada KD, Tomii K, Katoh K. Parallelization of MAFFT for large-scale multiple sequence alignments. Bioinformatics. 2018;34(14):2490-2492. doi:10.1093/bioinformatics/bty121. PMID:29506019. PMCID:PMC6041967.
Katoh K, Rozewicki J, Yamada KD. MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Briefings in Bioinformatics. 2017;20(4):1160-1166. doi:10.1093/bib/bbx108. PMID:28968734. PMCID:PMC6781576.
Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Molecular Biology and Evolution. 2013;30(4):772-780. doi:10.1093/molbev/mst010. PMID:23329690. PMCID:PMC3603318.
Katoh K. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Research. 2005;33(2):511-518. doi:10.1093/nar/gki198. PMID:15661851. PMCID:PMC548345.
Documentation
Downloads
- Downloads pagehttps://mafft.cbrc.jp/alignment/software/