PANDAseq

PANDAseq assembles overlapping Illumina paired-end reads and applies quality-aware error correction to reconstruct amplicons such as 16S rRNA gene sequences for microbial community analysis.


Key Features:

  • Error Correction: Integrates quality information from Illumina reads to correct mismatches and uncalled bases within paired-end reads.
  • Quality-Based Uncertainty Handling: Identifies uncertain corrections primarily in reads with numerous low-quality bases, which are flagged during upstream processing.
  • Alignment and Overlap Reconstruction: Aligns paired-end reads and reconstructs overlapping sequences, with optional inclusion of PCR primers in output sequences.
  • Efficiency and Speed: Demonstrates rapid assembly and minimized error incorporation in benchmark comparisons using simulated data with real error masks.
  • Scalability: Scales to process large datasets, including billions of paired-end reads.
  • Increased Sequence Yield: In control library assemblies, achieves a 4–50% increase in the number of assembled sequences compared to naïve assembly methods with negligible loss of high-quality sequences.
  • Benchmarks and Validation: Performance validated using simulated data with real error masks and pooled templates of genomic DNA from known organisms.

Scientific Applications:

  • Microbial community profiling: Assembly of 16S rRNA gene amplicons to improve accuracy of microbial diversity analyses.
  • Amplicon sequencing error mitigation: Correction of sequencing errors and recovery of additional high-quality sequences for downstream ecological and evolutionary studies.
  • Method validation and benchmarking: Use in control library and simulated-data benchmarks to evaluate assembly accuracy and yield.

Methodology:

Aligns paired-end Illumina reads, integrates base quality scores to correct mismatches and uncalled bases, optionally retains PCR primers, and reconstructs overlapping sequences; validation used simulated data with real error masks and pooled genomic DNA templates.

Topics

Details

License:
GPL-3.0
Tool Type:
command-line tool
Operating Systems:
Mac, Linux, Windows
Programming Languages:
C
Added:
5/27/2021
Last Updated:
11/24/2024

Operations

Publications

Masella AP, Bartram AK, Truszkowski JM, Brown DG, Neufeld JD. PANDAseq: paired-end assembler for illumina sequences. BMC Bioinformatics. 2012;13(1). doi:10.1186/1471-2105-13-31. PMID:22333067. PMCID:PMC3471323.

Documentation

Links