afterqc
afterqc performs quality control, adapter detection and trimming, error profiling and correction on fastq files to improve the accuracy of next-generation sequencing data.
Key Features:
- Automated Quality Control and Data Filtering: Processes fastq files within a folder and categorizes reads into 'good', 'bad', and 'QC' output folders per file or pair.
- Error Profiling and Correction: Profiles and corrects sequencing errors by analyzing overlapping regions in paired-end reads to identify and rectify erroneous bases.
- Adapter Detection and Trimming: Detects and trims sequencing adapters by analyzing overlaps between paired reads.
- Sequencing Bubble Detection and Visualization: Identifies and visualizes sequencing bubbles, artifacts originating from flowcell lanes.
- PolyX Filtering: Removes long homopolymer stretches (polyX) indicative of sequencing issues.
- Automatic Trimming: Trims low-quality bases from read ends to improve overall read integrity.
- K-mer Based Strand Bias Profiling: Assesses strand bias using k-mer–based profiling.
- Comprehensive Reporting: Generates detailed HTML reports with interactive figures for each processed file or pair.
- Batch Processing with Multiprocess Support: Handles single fastq files, paired-end data, or entire folders and supports multiprocess batch execution.
- Error Rate Estimation and Distribution Profiling: Estimates sequencing error rates and profiles their distribution by analyzing overlapping regions.
Scientific Applications:
- Clinical Diagnostics: Reduces sequencing errors that can lead to false-positive variant calls in clinical sequencing workflows.
- Low-Frequency Somatic Mutation Detection: Enhances detection accuracy for low-frequency somatic variants by correcting sequencing errors.
- Variant Calling and Genomic Analyses: Improves the reliability of downstream variant calling and other genomic analyses through error correction and quality filtering.
- Sequencing Platform Error Characterization: Profiles error distributions and platform-dependent patterns to inform optimization of sequencing protocols.
Methodology:
Analyzes overlaps in paired-end reads to detect and correct base errors, detect adapters, and estimate error rates and distributions; applies polyX filtering, automatic end trimming, and k-mer–based strand-bias profiling; detects and visualizes sequencing bubbles; processes fastq files in batch with multiprocess support and outputs categorized 'good', 'bad', and 'QC' folders along with HTML reports.
Topics
Details
- License:
- MIT
- Cost:
- Free of charge
- Tool Type:
- command-line tool
- Operating Systems:
- Mac, Linux, Windows
- Programming Languages:
- Python, C++
- Added:
- 3/16/2022
- Last Updated:
- 3/16/2022
Operations
Publications
Chen S, Huang T, Zhou Y, Han Y, Xu M, Gu J. AfterQC: automatic filtering, trimming, error removing and quality control for fastq data. BMC Bioinformatics. 2017;18(S3). doi:10.1186/s12859-017-1469-3. PMID:28361673. PMCID:PMC5374548.