BIGpre

BIGpre evaluates and preprocesses next-generation sequencing (NGS) data to assess read-level quality and prepare datasets for downstream analyses such as genome assembly, variant calling, and transcriptome profiling.


Key Features:

  • Platform Compatibility: Supports Illumina and 454 sequencing platforms.
  • Read-level Quality Metrics: Evaluates correlation between forward and reverse reads, analyzes read GC-content distribution, and assesses base N quality.
  • Duplicate Read Management: Detects and removes duplicate reads while accounting for sequencing errors.
  • Quality Trimming: Trims low-quality reads from raw data.
  • Efficient Processing: Processes hundreds of millions of reads within minutes to provide rapid diagnostic information.
  • Graphical and Tabular Summaries: Generates tabular and graphical summaries using the R statistics package.
  • Implementation: Written primarily in Perl.

Scientific Applications:

  • NGS Quality Control: Provides immediate and comprehensive assessment of read-level quality metrics for NGS datasets.
  • Genome Assembly: Improves input data quality for genome assembly by removing duplicates and trimming low-quality reads.
  • Variant Calling: Supports variant calling by assessing base quality and removing duplicate reads that could bias calls.
  • Transcriptome Profiling: Enhances transcriptome analyses by evaluating read quality metrics and trimming low-quality reads.

Methodology:

Computational steps include evaluating forward/reverse read correlation, computing read GC-content distributions and base N quality, detecting and removing duplicate reads with error-aware filtering, performing quality trimming, and producing tabular and graphical summaries via R; the software is implemented primarily in Perl and processes hundreds of millions of reads within minutes.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux
Programming Languages:
Perl
Added:
8/3/2017
Last Updated:
11/25/2024

Operations

Publications

Zhang T, Luo Y, Liu K, Pan L, Zhang B, Yu J, Hu S. BIGpre: A Quality Assessment Package for Next-Generation Sequencing Data. Genomics, Proteomics & Bioinformatics. 2011;9(6):238-244. doi:10.1016/s1672-0229(11)60027-2. PMID:22289480. PMCID:PMC5054156.

PMID: 22289480
PMCID: PMC5054156
Funding: - National Natural Science Foundation of China: 30900825, 31000561 - Knowledge Innovation Program of the Chinese Academy of Sciences: KSCX2-EW-R-01-04

Documentation

Links