ABySS

ABySS assembles de novo genomes from short-read sequencing data to produce draft and scaffolded assemblies for genomic variation analysis and reference genome construction.


Key Features:

  • Parallelized Assembly: Leverages parallel computing with a message-passing interface (MPI) and evolved algorithms to distribute work and reduce memory requirements for assembling large short-read datasets.
  • Probabilistic Data Structures: Represents de Bruijn graphs using Bloom filters to minimize memory footprint during assembly.
  • Memory Efficiency and Performance: ABySS 2.0 can achieve a scaffold NG50 of 3.5 Mbp using less than 35 GB of RAM.
  • Integration with Long-Read Technologies: Incorporates long-range data from BioNano Genomics and 10x Genomics' Chromium to improve scaffold contiguity, reaching NG50 up to 42 Mbp.

Scientific Applications:

  • Genomic Variation Analysis: Provides assembled sequences that enable detailed analysis of genomic variation between species and within individuals.
  • Reference Genome Construction: Generates high-quality draft genomes suitable for constructing reference assemblies for comparative genomics.
  • Discovery of Novel Sequences: Identifies polymorphic and novel sequences absent from existing human reference assemblies.

Methodology:

Builds and traverses de Bruijn graphs represented with Bloom filters and performs parallelized assembly using MPI and memory-reducing algorithms.

Topics

Details

License:
GPL-3.0
Maturity:
Mature
Cost:
Free of charge
Tool Type:
command-line tool
Operating Systems:
Linux, Mac
Programming Languages:
C++
Added:
3/16/2016
Last Updated:
11/24/2024

Operations

Publications

Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol İ. ABySS: A parallel assembler for short read sequence data. Genome Research. 2009;19(6):1117-1123. doi:10.1101/gr.089532.108. PMID:19251739. PMCID:PMC2694472.

Jackman SD, Vandervalk BP, Mohamadi H, Chu J, Yeo S, Hammond SA, Jahesh G, Khan H, Coombe L, Warren RL, Birol I. ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter. Genome Research. 2017;27(5):768-777. doi:10.1101/gr.214346.116. PMID:28232478. PMCID:PMC5411771.

PMID: 28232478
PMCID: PMC5411771
Funding: - National Institutes of Health: R01HG007182

Documentation

Links

Related Tools

rresolver
Relation: usedBy