MMseqs2
MMseqs2 performs high-throughput sequence searching and clustering to enable large-scale analysis and annotation of protein and nucleotide sequences, including metagenomic datasets.
Key Features:
- Speed and efficiency: Executes sequence searches with reported speedups up to 10,000-fold versus BLAST and over 100-fold speedups with comparable accuracy while maintaining near-equivalent sensitivity.
- Profile searches: Supports profile-based searches with sensitivities comparable to PSI-BLAST at reported speeds exceeding 400-fold higher.
- Linclust algorithm: Implements Linclust with runtime scaling linear in input size N and independent of number of clusters K, enabling clustering of very large datasets (e.g., 1.6 billion metagenomic fragments in ~10 hours on a single server at 50% sequence identity) with reported >1000-fold speedups versus previous methods.
- Parallelization and scalability: Leverages parallel processing across multiple cores and servers to scale to massive protein and nucleotide sequence datasets.
- Fast search modes: Provides low-runtime-overhead search modes with sensitivities approaching BLAST for rapid query response.
- Taxonomic annotation (MMseqs2 taxonomy): Extracts protein fragments from metagenomic contigs, retains fragments useful for annotation, assigns taxonomic labels by weighted voting to determine contig identity, and includes modules for creating and manipulating taxonomic reference databases and visualizing assignments, with reported speedups of 2–18× versus state-of-the-art tools.
Scientific Applications:
- Functional annotation: Enables large-scale functional annotation and structure-prediction workflows for metagenomic datasets comprising billions of protein sequences.
- Redundancy reduction and clustering: Performs similarity-based clustering to reduce redundancy and produce representative sequence sets for downstream analyses.
- Taxonomic classification: Assigns taxonomic labels to metagenomic contigs via protein-fragment extraction and weighted voting to support taxonomic profiling.
- Database construction and curation: Facilitates creation and manipulation of large reference sequence and taxonomic databases for genomic and metagenomic research.
Methodology:
Uses the Linclust algorithm for linear-time clustering, supports profile searches comparable to PSI-BLAST, extracts protein fragments from contigs and assigns taxonomy via weighted voting, and employs parallel processing and optimized computational strategies for large-scale sequence search and clustering.
Topics
Details
- License:
- MIT
- Maturity:
- Mature
- Cost:
- Free of charge
- Tool Type:
- command-line tool
- Operating Systems:
- Windows, Linux, Mac
- Programming Languages:
- C++
- Added:
- 7/3/2019
- Last Updated:
- 11/10/2025
Operations
Data Inputs & Outputs
Sequence alignment
Inputs
Outputs
Taxonomic classification
Publications
Steinegger M, Söding J. Clustering huge protein sequence sets in linear time. Nature Communications. 2018;9(1). doi:10.1038/s41467-018-04964-5. PMID:29959318. PMCID:PMC6026198.
Mirdita M, Steinegger M, Söding J. MMseqs2 desktop and local web server app for fast, interactive sequence searches. Bioinformatics. 2019;35(16):2856-2858. doi:10.1093/bioinformatics/bty1057. PMID:30615063. PMCID:PMC6691333.
Steinegger M, Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology. 2017;35(11):1026-1028. doi:10.1038/nbt.3988. PMID:29035372.
Steinegger M, Söding J. MMseqs2: sensitive protein sequence searching for the analysis of massive data sets. Unknown Journal. 2016. doi:10.1101/079681.
Mirdita M, Steinegger M, Breitwieser F, Söding J, Levy Karin E. Fast and sensitive taxonomic assignment to metagenomic contigs. Unknown Journal. 2020. doi:10.1101/2020.11.27.401018.
Mirdita M, Steinegger M, Breitwieser F, Söding J, Levy Karin E. Fast and sensitive taxonomic assignment to metagenomic contigs. Bioinformatics. 2021;37(18):3029-3031. doi:10.1093/bioinformatics/btab184. PMID:33734313. PMCID:PMC8479651.
Kallenborn F, Chacon A, Hundt C, Sirelkhatim H, Didi K, Cha S, Dallago C, Mirdita M, Schmidt B, Steinegger M. GPU-accelerated homology search with MMseqs2. Nature Methods. 2025;22(10):2024-2027. doi:10.1038/s41592-025-02819-8. PMID:40968302. PMCID:PMC12510879.