LightAssembler

LightAssembler assembles genomes from next-generation sequencing (NGS) reads using cache-oblivious Bloom filters and lightweight graph simplification to enable resource-efficient de novo assembly.


Key Features:

  • Cache-Oblivious Bloom Filters: Employs a pair of cache-oblivious Bloom filters where one holds a uniform sample of g-spaced sequenced k-mers and the other contains k-mers classified as likely correct by a straightforward statistical test, enabling efficient handling of redundant nodes and branching edges in assembly graphs.
  • Resource Efficiency: Minimizes memory consumption while preserving assembly accuracy and contiguity, enabling assembly of large genomes such as mammals on standard desktop machines.
  • Graph Traversal and Simplification: Implements lightweight graph traversal and simplification modules to manage complex assembly graphs produced from error-prone NGS reads and genomic repeats.
  • Adaptability to Gap Sizes: Operates as a gap-based sequence assembler and maintains consistent assembly size and genome coverage across different gap sizes.
  • Efficient Assembly Graph Encoding: Encodes assembly graphs efficiently in memory by reducing redundancy and optimizing memory usage.

Scientific Applications:

  • De novo assembly of large genomes: Assembles large eukaryotic genomes, including mammalian genomes, under limited computational resource conditions.
  • Assembly from error-prone NGS reads: Handles complex assembly graphs arising from sequencing errors and genomic repeats via traversal and simplification modules.
  • Benchmarking and validation: Has been validated using GAGE and Assemblathon benchmark datasets.

Methodology:

Uses a pair of cache-oblivious Bloom filters (one sampling g-spaced k-mers, one storing k-mers deemed likely correct by a straightforward statistical test), lightweight graph traversal and simplification modules, operates as a gap-based sequence assembler, and encodes assembly graphs efficiently in memory by reducing redundancy and optimizing memory usage.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux
Programming Languages:
C++
Added:
8/3/2017
Last Updated:
11/25/2024

Operations

Publications

El-Metwally S, Zakaria M, Hamza T. LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads. Bioinformatics. 2016;32(21):3215-3223. doi:10.1093/bioinformatics/btw470. PMID:27412092.

Documentation

Links