RSEM
RSEM estimates gene and isoform expression from RNA-Seq reads using a generative statistical model that accounts for read mapping uncertainty.
Key Features:
- Handling Read Mapping Uncertainty: Uses a probabilistic generative model to incorporate reads that map to multiple genes or isoforms rather than discarding ambiguous reads or assigning them heuristically.
- Gene Expression Estimation: Reports gene-level expression as the sum of estimated isoform expression levels to reflect isoform contributions to gene activity.
- Modeling Non-Uniform Read Distributions: Models non-uniform read distributions across transcripts to improve accuracy of expression estimates.
- Optimal Read Length Determination: Uses simulations parameterized with real RNA-Seq data to identify that 20–25 base read lengths are optimal for gene-level estimation in mouse and maize under fixed sequencing throughput.
Scientific Applications:
- Transcriptomics studies: Quantifies isoform and gene expression from RNA-Seq datasets for transcriptome characterization.
- Differential expression analysis: Provides expression estimates suitable for comparing gene and isoform abundance across conditions.
- Functional genomics: Supplies quantified expression inputs for downstream functional and regulatory analyses across organisms including mouse and maize.
Methodology:
Implements a generative statistical framework with an expectation-maximization algorithm to infer isoform and gene expression, models non-uniform read distributions across transcripts, and performs simulations parameterized with real RNA-Seq data for read-length evaluation.
Topics
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Linux, Mac
- Programming Languages:
- C++
- Added:
- 1/13/2017
- Last Updated:
- 11/25/2024
Operations
Publications
Li B, Ruotti V, Stewart RM, Thomson JA, Dewey CN. RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics. 2009;26(4):493-500. doi:10.1093/bioinformatics/btp692. PMID:20022975. PMCID:PMC2820677.