MMseqs

MMseqs performs fast sequence searching, clustering, and homology detection to reduce redundancy and enable large-scale sequence and metagenomic analyses.

Key Features:

Redundancy reduction: Clusters sequences to a maximum pairwise identity of 50% or lower to reduce redundancy in sequence databases.
Speed and sensitivity: Balances high speed and sensitivity, showing superior sensitivity to UBLAST and RAPsearch while running 4-30× faster than those tools, though not matching BLAST sensitivity.
Comparative performance: Outperforms BLASTclust, CD-HIT, and USEARCH in clustering and search throughput.
Prefiltering module: Identifies similar k-mers between query and target sequences and sums their scores to quickly prefilter candidates.
Local alignment module: Performs local alignments using SSE2 vector instructions and multi-core parallelization for efficient alignment.
Clustering module: Enables deep clustering of large databases down to approximately 30% sequence identity at speeds reported as hundreds of times faster than BLASTclust.
Cascaded clustering: Employs a cascaded clustering approach that allows database updates in linear time instead of quadratic time.

Scientific Applications:

Homology detection: Sensitive detection of homologs in large sequence datasets, with sensitivity exceeding UBLAST and RAPsearch.
Database clustering: Deep clustering of sequence databases to reduce redundancy and create representative sequence sets down to ~30% identity.
Metagenomic sequence analysis: Analysis of metagenomic datasets where many reads lack matches to known sequences by BLAST or HMMER3.
Large-scale sequence processing: Scalable processing of massive datasets using cascaded clustering and parallelized alignment.

Methodology:

Fast prefiltering by identifying similar k-mers and summing scores, local alignments using SSE2 and multi-core parallelization, and cascaded clustering enabling linear-time database updates.

Visit Official Homepage →

Topics

Proteins Gene and protein families

Details

License:: GPL-3.0
Maturity:: Legacy
Cost:: Free of charge
Tool Type:: workflow
Operating Systems:: Linux
Programming Languages:: C++, C
Added:: 8/3/2017
Last Updated:: 11/25/2024

Operations

Sequence clustering

Publications

Hauser M, Steinegger M, Söding J. MMseqs software suite for fast and deep clustering and searching of large protein sequence sets. Bioinformatics. 2016;32(9):1323-1330. doi:10.1093/bioinformatics/btw006. PMID:26743509.

DOI: 10.1093/bioinformatics/btw006

PMID: 26743509

Documentation

General

https://github.com/soedinglab/MMseqs/blob/master/user_guide.pdf

Related Tools

MMseqs2

Relation: hasNewVersion

← Back to search