ABySS
ABySS assembles de novo genomes from short-read sequencing data to produce draft and scaffolded assemblies for genomic variation analysis and reference genome construction.
Key Features:
- Parallelized Assembly: Leverages parallel computing with a message-passing interface (MPI) and evolved algorithms to distribute work and reduce memory requirements for assembling large short-read datasets.
- Probabilistic Data Structures: Represents de Bruijn graphs using Bloom filters to minimize memory footprint during assembly.
- Memory Efficiency and Performance: ABySS 2.0 can achieve a scaffold NG50 of 3.5 Mbp using less than 35 GB of RAM.
- Integration with Long-Read Technologies: Incorporates long-range data from BioNano Genomics and 10x Genomics' Chromium to improve scaffold contiguity, reaching NG50 up to 42 Mbp.
Scientific Applications:
- Genomic Variation Analysis: Provides assembled sequences that enable detailed analysis of genomic variation between species and within individuals.
- Reference Genome Construction: Generates high-quality draft genomes suitable for constructing reference assemblies for comparative genomics.
- Discovery of Novel Sequences: Identifies polymorphic and novel sequences absent from existing human reference assemblies.
Methodology:
Builds and traverses de Bruijn graphs represented with Bloom filters and performs parallelized assembly using MPI and memory-reducing algorithms.
Topics
Details
- License:
- GPL-3.0
- Maturity:
- Mature
- Cost:
- Free of charge
- Tool Type:
- command-line tool
- Operating Systems:
- Linux, Mac
- Programming Languages:
- C++
- Added:
- 3/16/2016
- Last Updated:
- 11/24/2024
Operations
Publications
Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol İ. ABySS: A parallel assembler for short read sequence data. Genome Research. 2009;19(6):1117-1123. doi:10.1101/gr.089532.108. PMID:19251739. PMCID:PMC2694472.
Jackman SD, Vandervalk BP, Mohamadi H, Chu J, Yeo S, Hammond SA, Jahesh G, Khan H, Coombe L, Warren RL, Birol I. ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter. Genome Research. 2017;27(5):768-777. doi:10.1101/gr.214346.116. PMID:28232478. PMCID:PMC5411771.