FQStat

FQStat assesses quality metrics of giga-to-tera-base high-throughput DNA and RNA sequencing datasets to provide rapid read- and base-level quality evaluation for downstream analysis.


Key Features:

  • Parallel Programming Architecture: Uses an optimized parallel programming architecture for batch processing to perform high-speed quality metric calculations across multiple datasets.
  • Platform Independence: Operates across different computing environments without dependence on specific system configurations.
  • Automatic Resource Optimization: Automatically determines optimal core and memory allocation per input file based on machine architecture and input data characteristics, balancing core assignment overhead and performance saturation.
  • Comprehensive Output Formats: Produces results as an HTML web page, tab-delimited text files, and high-resolution images.
  • Detailed Quality Metrics Calculation: Calculates and visualizes read count, read length, quality score, and high-quality base statistics.
  • Identification of Low-Quality Data: Detects and flags low-quality sequencing data for removal from downstream analyses.
  • Multi-Level Quality Assessment: Computes QC statistics at lane, sample, and experiment levels to pinpoint low-quality subsets without discarding entire samples.

Scientific Applications:

  • Preprocessing for downstream analysis: Assess and filter sequencing data to ensure the reliability of downstream biological and clinical analyses.
  • High-throughput dataset QC: Perform rapid quality assessment across giga-to-tera-base sequencing datasets and multiple datasets simultaneously.
  • Experimental troubleshooting and salvage: Localize low-quality data at lane, sample, or experiment levels to guide selective removal or salvage of reliable data.

Methodology:

FQStat applies an optimized parallel programming architecture for batch processing, automatically determines optimal core and memory allocations per file based on machine architecture and input characteristics, processes multiple datasets simultaneously and independently, and calculates and visualizes read count, read length, quality score, and high-quality base statistics; comparisons with similar parallel QC tools demonstrated run-time improvements.

Topics

Details

Tool Type:
command-line tool, desktop application
Programming Languages:
Python
Added:
11/14/2019
Last Updated:
12/29/2020

Operations

Publications

Chanumolu SK, Albahrani M, Otu HH. FQStat: a parallel architecture for very high-speed assessment of sequencing quality metrics. BMC Bioinformatics. 2019;20(1). doi:10.1186/s12859-019-3015-y. PMID:31416440. PMCID:PMC6694608.

PMID: 31416440
PMCID: PMC6694608
Funding: - National Institutes of Health: R21LM012759

Documentation