PennSeq

PennSeq estimates isoform-specific gene expression from RNA sequencing (RNA-Seq) data using a non-parametric statistical model that accounts for isoform-specific, non-uniform read distributions and sequencing biases.


Key Features:

  • Non-parametric modeling: Employs a non-parametric statistical approach that avoids parametric assumptions about read distributions.
  • Isoform-specific read distributions: Allows each isoform to have its own non-uniform read distribution.
  • Bias accommodation: Accounts for hexamer priming bias, local sequence bias, positional bias, RNA degradation, mapping bias, and other unknown factors.
  • Empirical fragment sampling probabilities: Estimates the probability that a fragment is sampled from particular regions based on aligned RNA-Seq data.
  • Validation datasets: Evaluated on simulated datasets with known ground truth and on real Illumina RNA-Seq datasets including qRT-PCR measurements.
  • Performance advantage: Provides improved isoform-expression estimation relative to existing methods, particularly under severe non-uniformity.

Scientific Applications:

  • Isoform-level quantification: Estimating isoform-specific gene expression from RNA-Seq data.
  • Bias-sensitive analyses: Improving accuracy of expression estimates in studies affected by sequencing and sample biases.
  • Disease-associated gene studies: Enabling more accurate identification of genes associated with disease susceptibility through better isoform quantification.
  • Method benchmarking: Benchmarking expression-estimation methods using simulated ground truth and qRT-PCR validation.

Methodology:

PennSeq models isoform-specific, non-uniform read distributions using a non-parametric statistical approach and estimates empirical fragment sampling probabilities from aligned RNA-Seq data; performance was evaluated on simulated datasets and Illumina RNA-Seq datasets with qRT-PCR measurements.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux, Windows
Programming Languages:
Perl
Added:
12/18/2017
Last Updated:
11/24/2024

Operations

Publications

Hu Y, Liu Y, Mao X, Jia C, Ferguson JF, Xue C, Reilly MP, Li H, Li M. PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution. Nucleic Acids Research. 2013;42(3):e20-e20. doi:10.1093/nar/gkt1304. PMID:24362841. PMCID:PMC3919567.

Links