afterqc

afterqc performs quality control, adapter detection and trimming, error profiling and correction on fastq files to improve the accuracy of next-generation sequencing data.


Key Features:

  • Automated Quality Control and Data Filtering: Processes fastq files within a folder and categorizes reads into 'good', 'bad', and 'QC' output folders per file or pair.
  • Error Profiling and Correction: Profiles and corrects sequencing errors by analyzing overlapping regions in paired-end reads to identify and rectify erroneous bases.
  • Adapter Detection and Trimming: Detects and trims sequencing adapters by analyzing overlaps between paired reads.
  • Sequencing Bubble Detection and Visualization: Identifies and visualizes sequencing bubbles, artifacts originating from flowcell lanes.
  • PolyX Filtering: Removes long homopolymer stretches (polyX) indicative of sequencing issues.
  • Automatic Trimming: Trims low-quality bases from read ends to improve overall read integrity.
  • K-mer Based Strand Bias Profiling: Assesses strand bias using k-mer–based profiling.
  • Comprehensive Reporting: Generates detailed HTML reports with interactive figures for each processed file or pair.
  • Batch Processing with Multiprocess Support: Handles single fastq files, paired-end data, or entire folders and supports multiprocess batch execution.
  • Error Rate Estimation and Distribution Profiling: Estimates sequencing error rates and profiles their distribution by analyzing overlapping regions.

Scientific Applications:

  • Clinical Diagnostics: Reduces sequencing errors that can lead to false-positive variant calls in clinical sequencing workflows.
  • Low-Frequency Somatic Mutation Detection: Enhances detection accuracy for low-frequency somatic variants by correcting sequencing errors.
  • Variant Calling and Genomic Analyses: Improves the reliability of downstream variant calling and other genomic analyses through error correction and quality filtering.
  • Sequencing Platform Error Characterization: Profiles error distributions and platform-dependent patterns to inform optimization of sequencing protocols.

Methodology:

Analyzes overlaps in paired-end reads to detect and correct base errors, detect adapters, and estimate error rates and distributions; applies polyX filtering, automatic end trimming, and k-mer–based strand-bias profiling; detects and visualizes sequencing bubbles; processes fastq files in batch with multiprocess support and outputs categorized 'good', 'bad', and 'QC' folders along with HTML reports.

Topics

Details

License:
MIT
Cost:
Free of charge
Tool Type:
command-line tool
Operating Systems:
Mac, Linux, Windows
Programming Languages:
Python, C++
Added:
3/16/2022
Last Updated:
3/16/2022

Operations

Publications

Chen S, Huang T, Zhou Y, Han Y, Xu M, Gu J. AfterQC: automatic filtering, trimming, error removing and quality control for fastq data. BMC Bioinformatics. 2017;18(S3). doi:10.1186/s12859-017-1469-3. PMID:28361673. PMCID:PMC5374548.

Downloads