SSAKE

SSAKE assembles short DNA sequence reads from high-throughput sequencing technologies such as Solexa into longer contiguous sequences to enable de novo genome reconstruction and characterization.


Key Features:

  • Aggressive Assembly Strategy: Progressively searches to identify and extend the longest possible k-mer overlaps between reads to maximize use of short sequences (e.g., 25-nucleotide Solexa reads).
  • Prefix Tree and Hash Table: Stores sequence data in a hash table and leverages a prefix tree data structure to efficiently manage and search overlaps among reads.
  • High-Throughput Compatibility: Designed to process millions of short reads simultaneously to accommodate high-throughput sequencing datasets.
  • Stringent Assembly for Identical Sequences: Emphasizes stringent assembly of highly identical sequences to mitigate ambiguity introduced by ubiquitous genomic repeats.

Scientific Applications:

  • De novo sequencing projects: Assembles short reads into longer contigs to characterize novel genomic targets.
  • Genome assembly: Facilitates construction of larger contiguous genomic sequences from fragmented short-read data.
  • Variant detection: Improves resolution of genomic regions to aid identification of genetic variants.
  • Structural genomics: Assists analysis of structural variation within genomes by producing longer assembled sequences.

Methodology:

Sequence reads are stored in a hash table, a prefix tree is used to index and search reads, and the algorithm progressively extends the longest k-mer overlaps between sequences.

Topics

Details

License:
GPL-2.0
Tool Type:
command-line tool
Operating Systems:
Linux
Programming Languages:
Perl
Added:
1/13/2017
Last Updated:
11/25/2024

Operations

Publications

Warren RL, Sutton GG, Jones SJM, Holt RA. Assembling millions of short DNA sequences using SSAKE. Bioinformatics. 2006;23(4):500-501. doi:10.1093/bioinformatics/btl629. PMID:17158514. PMCID:PMC7109930.