SSAKE
SSAKE assembles short DNA sequence reads from high-throughput sequencing technologies such as Solexa into longer contiguous sequences to enable de novo genome reconstruction and characterization.
Key Features:
- Aggressive Assembly Strategy: Progressively searches to identify and extend the longest possible k-mer overlaps between reads to maximize use of short sequences (e.g., 25-nucleotide Solexa reads).
- Prefix Tree and Hash Table: Stores sequence data in a hash table and leverages a prefix tree data structure to efficiently manage and search overlaps among reads.
- High-Throughput Compatibility: Designed to process millions of short reads simultaneously to accommodate high-throughput sequencing datasets.
- Stringent Assembly for Identical Sequences: Emphasizes stringent assembly of highly identical sequences to mitigate ambiguity introduced by ubiquitous genomic repeats.
Scientific Applications:
- De novo sequencing projects: Assembles short reads into longer contigs to characterize novel genomic targets.
- Genome assembly: Facilitates construction of larger contiguous genomic sequences from fragmented short-read data.
- Variant detection: Improves resolution of genomic regions to aid identification of genetic variants.
- Structural genomics: Assists analysis of structural variation within genomes by producing longer assembled sequences.
Methodology:
Sequence reads are stored in a hash table, a prefix tree is used to index and search reads, and the algorithm progressively extends the longest k-mer overlaps between sequences.
Topics
Details
- License:
- GPL-2.0
- Tool Type:
- command-line tool
- Operating Systems:
- Linux
- Programming Languages:
- Perl
- Added:
- 1/13/2017
- Last Updated:
- 11/25/2024
Operations
Publications
Warren RL, Sutton GG, Jones SJM, Holt RA. Assembling millions of short DNA sequences using SSAKE. Bioinformatics. 2006;23(4):500-501. doi:10.1093/bioinformatics/btl629. PMID:17158514. PMCID:PMC7109930.