FQStat
FQStat assesses quality metrics of giga-to-tera-base high-throughput DNA and RNA sequencing datasets to provide rapid read- and base-level quality evaluation for downstream analysis.
Key Features:
- Parallel Programming Architecture: Uses an optimized parallel programming architecture for batch processing to perform high-speed quality metric calculations across multiple datasets.
- Platform Independence: Operates across different computing environments without dependence on specific system configurations.
- Automatic Resource Optimization: Automatically determines optimal core and memory allocation per input file based on machine architecture and input data characteristics, balancing core assignment overhead and performance saturation.
- Comprehensive Output Formats: Produces results as an HTML web page, tab-delimited text files, and high-resolution images.
- Detailed Quality Metrics Calculation: Calculates and visualizes read count, read length, quality score, and high-quality base statistics.
- Identification of Low-Quality Data: Detects and flags low-quality sequencing data for removal from downstream analyses.
- Multi-Level Quality Assessment: Computes QC statistics at lane, sample, and experiment levels to pinpoint low-quality subsets without discarding entire samples.
Scientific Applications:
- Preprocessing for downstream analysis: Assess and filter sequencing data to ensure the reliability of downstream biological and clinical analyses.
- High-throughput dataset QC: Perform rapid quality assessment across giga-to-tera-base sequencing datasets and multiple datasets simultaneously.
- Experimental troubleshooting and salvage: Localize low-quality data at lane, sample, or experiment levels to guide selective removal or salvage of reliable data.
Methodology:
FQStat applies an optimized parallel programming architecture for batch processing, automatically determines optimal core and memory allocations per file based on machine architecture and input characteristics, processes multiple datasets simultaneously and independently, and calculates and visualizes read count, read length, quality score, and high-quality base statistics; comparisons with similar parallel QC tools demonstrated run-time improvements.
Topics
Details
- Tool Type:
- command-line tool, desktop application
- Programming Languages:
- Python
- Added:
- 11/14/2019
- Last Updated:
- 12/29/2020
Operations
Publications
Chanumolu SK, Albahrani M, Otu HH. FQStat: a parallel architecture for very high-speed assessment of sequencing quality metrics. BMC Bioinformatics. 2019;20(1). doi:10.1186/s12859-019-3015-y. PMID:31416440. PMCID:PMC6694608.