MMseqs
MMseqs performs fast sequence searching, clustering, and homology detection to reduce redundancy and enable large-scale sequence and metagenomic analyses.
Key Features:
- Redundancy reduction: Clusters sequences to a maximum pairwise identity of 50% or lower to reduce redundancy in sequence databases.
- Speed and sensitivity: Balances high speed and sensitivity, showing superior sensitivity to UBLAST and RAPsearch while running 4-30× faster than those tools, though not matching BLAST sensitivity.
- Comparative performance: Outperforms BLASTclust, CD-HIT, and USEARCH in clustering and search throughput.
- Prefiltering module: Identifies similar k-mers between query and target sequences and sums their scores to quickly prefilter candidates.
- Local alignment module: Performs local alignments using SSE2 vector instructions and multi-core parallelization for efficient alignment.
- Clustering module: Enables deep clustering of large databases down to approximately 30% sequence identity at speeds reported as hundreds of times faster than BLASTclust.
- Cascaded clustering: Employs a cascaded clustering approach that allows database updates in linear time instead of quadratic time.
Scientific Applications:
- Homology detection: Sensitive detection of homologs in large sequence datasets, with sensitivity exceeding UBLAST and RAPsearch.
- Database clustering: Deep clustering of sequence databases to reduce redundancy and create representative sequence sets down to ~30% identity.
- Metagenomic sequence analysis: Analysis of metagenomic datasets where many reads lack matches to known sequences by BLAST or HMMER3.
- Large-scale sequence processing: Scalable processing of massive datasets using cascaded clustering and parallelized alignment.
Methodology:
Fast prefiltering by identifying similar k-mers and summing scores, local alignments using SSE2 and multi-core parallelization, and cascaded clustering enabling linear-time database updates.
Topics
Details
- License:
- GPL-3.0
- Maturity:
- Legacy
- Cost:
- Free of charge
- Tool Type:
- workflow
- Operating Systems:
- Linux
- Programming Languages:
- C++, C
- Added:
- 8/3/2017
- Last Updated:
- 11/25/2024
Operations
Publications
Hauser M, Steinegger M, Söding J. MMseqs software suite for fast and deep clustering and searching of large protein sequence sets. Bioinformatics. 2016;32(9):1323-1330. doi:10.1093/bioinformatics/btw006. PMID:26743509.
PMID: 26743509
Documentation
Related Tools
MMseqs2
Relation: hasNewVersion