SEED

SEED clusters next-generation sequencing (NGS) short-read sequences to reduce redundancy and facilitate genome and transcriptome assembly and small RNA cluster discovery.


Key Features:

  • Efficient Clustering Algorithm: Employs a modified spaced seed method called block spaced seeds to form sequence clusters.
  • Error and Overhang Tolerance: Forms clusters where sequences can differ by up to three mismatches and three overhanging residues from their virtual center.
  • Scalability and Speed: Achieves linear time and memory performance and can cluster 100 million short read sequences in less than four hours, handling datasets with tens of millions of reads.
  • Preprocessing for Assembly: When used before Velvet/Oasis assembly, reduces assembler time by 60–85% and memory by 21–41% while producing contigs with N50 values 12–27% larger.
  • Performance Comparison: Generates clusters closely resembling true clusters and achieves a 2- to 10-fold improvement in time efficiency over other clustering tools.
  • Versatility: Functions as a standalone method for discovering clusters of small RNA sequences in NGS data from unsequenced organisms.

Scientific Applications:

  • Genome assembly preprocessing: Reduces redundancy and computational resources for genome assembly workflows using Velvet and similar assemblers.
  • Transcriptome assembly preprocessing: Optimizes transcriptome assembly by decreasing time and memory requirements and improving contig length metrics.
  • Small RNA cluster discovery: Identifies clusters of small RNA sequences in NGS data from unsequenced organisms.
  • Population and diversity analysis: Facilitates estimation of DNA/RNA molecule population sizes and exploration of genomic diversity.

Methodology:

Uses block spaced seeds (a modified spaced seed method) to cluster reads with tolerance of up to three mismatches and three overhanging residues from a virtual center and is implemented with linear time and memory algorithms.

Topics

Details

Maturity:
Mature
Tool Type:
command-line tool
Operating Systems:
Linux, Windows, Mac
Programming Languages:
C++
Added:
1/13/2017
Last Updated:
11/25/2024

Operations

Publications

Bao E, Jiang T, Kaloshian I, Girke T. SEED: efficient clustering of next-generation sequences. Bioinformatics. 2011;27(18):2502-2509. doi:10.1093/bioinformatics/btr447. PMID:21810899. PMCID:PMC3167058.

Documentation