preseq

preseq predicts the number of distinct reads and expected genome coverage obtainable from additional sequencing using initial shallow sequencing data to guide allocation of sequencing depth.


Key Features:

  • Predictive modeling: Uses a non-parametric empirical Bayes Poisson model to estimate coverage gains and the number of distinct reads from deeper sequencing.
  • Shallow-read analysis: Derives initial coverage and complexity statistics from shallow sequencing reads mapped to a reference genome.
  • Sequencing-depth optimization: Estimates the marginal utility of additional sequencing to support cost-effective selection of sequencing depth.
  • Library screening and comparison: Compares multiple libraries to identify high-complexity samples and flag low-complexity libraries unlikely to yield substantial new information upon deeper sequencing.

Scientific Applications:

  • Genomics: Forecasts coverage statistics to plan sequencing strategies in genomics research.
  • Single-cell DNA sequencing: Predicts genome coverage from early data where limited starting material and library-prep variability affect outcomes.
  • Detection of genetic variation: Informs experiment design for detecting genetic variation that may be difficult to resolve with bulk sequencing alone.
  • Library and protocol evaluation: Enables systematic comparison of sequencing libraries and strategies to decide whether additional sequencing is likely to be beneficial.

Methodology:

Analyzes shallow sequencing reads mapped to a reference genome to derive initial coverage and complexity statistics, then applies a non-parametric empirical Bayes Poisson framework to extrapolate expected distinct reads and genome coverage at increased sequencing depths.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux, Mac
Programming Languages:
Java, C++, Perl, C
Added:
8/3/2017
Last Updated:
11/24/2024

Operations

Publications

Daley T, Smith AD. Modeling genome coverage in single-cell sequencing. Bioinformatics. 2014;30(22):3159-3165. doi:10.1093/bioinformatics/btu540. PMID:25107873. PMCID:PMC4221128.

Documentation

Links