PASHA

PASHA assembles large genomes from short-read sequencing data using de Bruijn graph methods and parallel computation to enable scalable de novo assembly.


Key Features:

  • De Bruijn graph-based assembly: Implements de Bruijn graph methodologies for constructing genome assemblies from short reads.
  • Parallelization: Employs parallel computing techniques to improve throughput and resource utilization.
  • Hybrid computing architecture: Supports combined shared-memory multi-core CPU and distributed-memory compute cluster execution.
  • Scalability for large genomes: Optimized for large genome datasets, including human genome assemblies.
  • Support for paired-end data: Demonstrated performance on real paired-end short-read datasets.

Scientific Applications:

  • High-quality de novo assembly: Produces assemblies with greater contiguity compared to Velvet, ABySS, and SOAPdenovo on tested datasets.
  • Performance metrics: Reported results include an NG50 contig size of 503, longest correct contig size of 18,252, and NG50 scaffold size of 2,294.
  • Human genome assembly: Capable of completing a human genome assembly in approximately 21 hours on modest compute resources.
  • Adaptation to growing NGS datasets: Parallelized approach enables handling of increasing next-generation sequencing data volumes.

Methodology:

Uses de Bruijn graph-based assembly combined with parallel computing on hybrid shared-memory multi-core CPUs and distributed-memory compute clusters.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux
Added:
12/18/2017
Last Updated:
11/25/2024

Operations

Publications

Liu Y, Schmidt B, Maskell DL. Parallelized short read assembly of large genomes using de Bruijn graphs. BMC Bioinformatics. 2011;12(1). doi:10.1186/1471-2105-12-354. PMID:21867511. PMCID:PMC3167803.

Documentation

Links