PASHA
PASHA assembles large genomes from short-read sequencing data using de Bruijn graph methods and parallel computation to enable scalable de novo assembly.
Key Features:
- De Bruijn graph-based assembly: Implements de Bruijn graph methodologies for constructing genome assemblies from short reads.
- Parallelization: Employs parallel computing techniques to improve throughput and resource utilization.
- Hybrid computing architecture: Supports combined shared-memory multi-core CPU and distributed-memory compute cluster execution.
- Scalability for large genomes: Optimized for large genome datasets, including human genome assemblies.
- Support for paired-end data: Demonstrated performance on real paired-end short-read datasets.
Scientific Applications:
- High-quality de novo assembly: Produces assemblies with greater contiguity compared to Velvet, ABySS, and SOAPdenovo on tested datasets.
- Performance metrics: Reported results include an NG50 contig size of 503, longest correct contig size of 18,252, and NG50 scaffold size of 2,294.
- Human genome assembly: Capable of completing a human genome assembly in approximately 21 hours on modest compute resources.
- Adaptation to growing NGS datasets: Parallelized approach enables handling of increasing next-generation sequencing data volumes.
Methodology:
Uses de Bruijn graph-based assembly combined with parallel computing on hybrid shared-memory multi-core CPUs and distributed-memory compute clusters.
Topics
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Linux
- Added:
- 12/18/2017
- Last Updated:
- 11/25/2024
Operations
Publications
Liu Y, Schmidt B, Maskell DL. Parallelized short read assembly of large genomes using de Bruijn graphs. BMC Bioinformatics. 2011;12(1). doi:10.1186/1471-2105-12-354. PMID:21867511. PMCID:PMC3167803.