ClinQC
ClinQC performs quality control, filtering, trimming, and format standardization of Sanger and next-generation sequencing (Illumina, 454, Ion Torrent) data to produce Sanger-encoded high-quality reads for mutation identification in clinical genomics, including heterogeneous diseases such as cancer.
Key Features:
- Integration of Sanger and NGS data: Processes raw reads from Sanger sequencing and NGS platforms including Illumina, 454, and Ion Torrent.
- Format conversion and standardization: Converts input read files into FASTQ format to ensure compatibility across analyses.
- Adapter, primer, and contaminant removal: Removes adapters, PCR primers, and contaminating sequences to reduce technical artifacts.
- Low-quality read filtering: Filters out low-quality sequences to improve downstream sensitivity and specificity for mutation detection.
- Barcode demultiplexing and duplicate removal: Splits bar-coded samples and removes duplicate reads to support accurate sample-level analysis.
- Scalability and parallelization: Implemented in Python and leverages multiprocessing to process hundreds to thousands of samples in a single run.
- Comprehensive QC reporting: Generates detailed quality control reports documenting processing steps and outcomes.
- Output compatibility: Produces high-quality reads encoded with Sanger quality scores ready for downstream mutation screening.
Scientific Applications:
- Clinical variant detection: Facilitates identification and confirmation of disease-causing mutations for diagnostic and therapeutic applications.
- Cancer genomics: Improves mutation screening and data quality in studies of heterogeneous diseases such as cancer.
- Cross-platform data harmonization: Harmonizes Sanger and NGS datasets (Illumina, 454, Ion Torrent) for comparative analyses and pooled studies.
- Large-cohort sequencing studies: Enables processing of hundreds to thousands of samples for large-scale clinical research and mutation screening projects.
Methodology:
Conversion of input files to FASTQ; removal of adapters and PCR primers; filtering of low-quality reads and contaminants; splitting of bar-coded samples and removal of duplicates; output of standardized high-quality reads with Sanger-encoded quality scores and generation of QC reports; implemented in Python with multiprocessing.
Topics
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Linux, Windows, Mac
- Programming Languages:
- Java, Perl, Python
- Added:
- 4/27/2018
- Last Updated:
- 12/10/2018
Operations
Publications
Pandey RV, Pabinger S, Kriegner A, Weinhäusel A. ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research. BMC Bioinformatics. 2016;17(1). doi:10.1186/s12859-016-0915-y. PMID:26830926. PMCID:PMC4735967.