preseq
preseq predicts the number of distinct reads and expected genome coverage obtainable from additional sequencing using initial shallow sequencing data to guide allocation of sequencing depth.
Key Features:
- Predictive modeling: Uses a non-parametric empirical Bayes Poisson model to estimate coverage gains and the number of distinct reads from deeper sequencing.
- Shallow-read analysis: Derives initial coverage and complexity statistics from shallow sequencing reads mapped to a reference genome.
- Sequencing-depth optimization: Estimates the marginal utility of additional sequencing to support cost-effective selection of sequencing depth.
- Library screening and comparison: Compares multiple libraries to identify high-complexity samples and flag low-complexity libraries unlikely to yield substantial new information upon deeper sequencing.
Scientific Applications:
- Genomics: Forecasts coverage statistics to plan sequencing strategies in genomics research.
- Single-cell DNA sequencing: Predicts genome coverage from early data where limited starting material and library-prep variability affect outcomes.
- Detection of genetic variation: Informs experiment design for detecting genetic variation that may be difficult to resolve with bulk sequencing alone.
- Library and protocol evaluation: Enables systematic comparison of sequencing libraries and strategies to decide whether additional sequencing is likely to be beneficial.
Methodology:
Analyzes shallow sequencing reads mapped to a reference genome to derive initial coverage and complexity statistics, then applies a non-parametric empirical Bayes Poisson framework to extrapolate expected distinct reads and genome coverage at increased sequencing depths.
Topics
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Linux, Mac
- Programming Languages:
- Java, C++, Perl, C
- Added:
- 8/3/2017
- Last Updated:
- 11/24/2024
Operations
Publications
Daley T, Smith AD. Modeling genome coverage in single-cell sequencing. Bioinformatics. 2014;30(22):3159-3165. doi:10.1093/bioinformatics/btu540. PMID:25107873. PMCID:PMC4221128.