SEED

SEED clusters next-generation sequencing (NGS) short-read sequences to reduce redundancy and facilitate genome and transcriptome assembly and small RNA cluster discovery.

Key Features:

Efficient Clustering Algorithm: Employs a modified spaced seed method called block spaced seeds to form sequence clusters.
Error and Overhang Tolerance: Forms clusters where sequences can differ by up to three mismatches and three overhanging residues from their virtual center.
Scalability and Speed: Achieves linear time and memory performance and can cluster 100 million short read sequences in less than four hours, handling datasets with tens of millions of reads.
Preprocessing for Assembly: When used before Velvet/Oasis assembly, reduces assembler time by 60–85% and memory by 21–41% while producing contigs with N50 values 12–27% larger.
Performance Comparison: Generates clusters closely resembling true clusters and achieves a 2- to 10-fold improvement in time efficiency over other clustering tools.
Versatility: Functions as a standalone method for discovering clusters of small RNA sequences in NGS data from unsequenced organisms.

Scientific Applications:

Genome assembly preprocessing: Reduces redundancy and computational resources for genome assembly workflows using Velvet and similar assemblers.
Transcriptome assembly preprocessing: Optimizes transcriptome assembly by decreasing time and memory requirements and improving contig length metrics.
Small RNA cluster discovery: Identifies clusters of small RNA sequences in NGS data from unsequenced organisms.
Population and diversity analysis: Facilitates estimation of DNA/RNA molecule population sizes and exploration of genomic diversity.

Methodology:

Uses block spaced seeds (a modified spaced seed method) to cluster reads with tolerance of up to three mismatches and three overhanging residues from a virtual center and is implemented with linear time and memory algorithms.

Visit Official Homepage →

Topics

Metagenomics

Details

Maturity:: Mature
Tool Type:: command-line tool
Operating Systems:: Linux, Windows, Mac
Programming Languages:: C++
Added:: 1/13/2017
Last Updated:: 11/25/2024

Operations

Sequence clustering

Publications

Bao E, Jiang T, Kaloshian I, Girke T. SEED: efficient clustering of next-generation sequences. Bioinformatics. 2011;27(18):2502-2509. doi:10.1093/bioinformatics/btr447. PMID:21810899. PMCID:PMC3167058.

DOI: 10.1093/bioinformatics/btr447

PMID: 21810899

PMCID: PMC3167058

Documentation

General

https://github.com/baoe/SEED

← Back to search