ClinQC

ClinQC performs quality control, filtering, trimming, and format standardization of Sanger and next-generation sequencing (Illumina, 454, Ion Torrent) data to produce Sanger-encoded high-quality reads for mutation identification in clinical genomics, including heterogeneous diseases such as cancer.


Key Features:

  • Integration of Sanger and NGS data: Processes raw reads from Sanger sequencing and NGS platforms including Illumina, 454, and Ion Torrent.
  • Format conversion and standardization: Converts input read files into FASTQ format to ensure compatibility across analyses.
  • Adapter, primer, and contaminant removal: Removes adapters, PCR primers, and contaminating sequences to reduce technical artifacts.
  • Low-quality read filtering: Filters out low-quality sequences to improve downstream sensitivity and specificity for mutation detection.
  • Barcode demultiplexing and duplicate removal: Splits bar-coded samples and removes duplicate reads to support accurate sample-level analysis.
  • Scalability and parallelization: Implemented in Python and leverages multiprocessing to process hundreds to thousands of samples in a single run.
  • Comprehensive QC reporting: Generates detailed quality control reports documenting processing steps and outcomes.
  • Output compatibility: Produces high-quality reads encoded with Sanger quality scores ready for downstream mutation screening.

Scientific Applications:

  • Clinical variant detection: Facilitates identification and confirmation of disease-causing mutations for diagnostic and therapeutic applications.
  • Cancer genomics: Improves mutation screening and data quality in studies of heterogeneous diseases such as cancer.
  • Cross-platform data harmonization: Harmonizes Sanger and NGS datasets (Illumina, 454, Ion Torrent) for comparative analyses and pooled studies.
  • Large-cohort sequencing studies: Enables processing of hundreds to thousands of samples for large-scale clinical research and mutation screening projects.

Methodology:

Conversion of input files to FASTQ; removal of adapters and PCR primers; filtering of low-quality reads and contaminants; splitting of bar-coded samples and removal of duplicates; output of standardized high-quality reads with Sanger-encoded quality scores and generation of QC reports; implemented in Python with multiprocessing.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux, Windows, Mac
Programming Languages:
Java, Perl, Python
Added:
4/27/2018
Last Updated:
12/10/2018

Operations

Publications

Pandey RV, Pabinger S, Kriegner A, Weinhäusel A. ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research. BMC Bioinformatics. 2016;17(1). doi:10.1186/s12859-016-0915-y. PMID:26830926. PMCID:PMC4735967.

Documentation