SPARSim

SPARSim simulates single-cell RNA sequencing (scRNA-seq) count data to reproduce count intensity, variability, and sparsity characteristics for development and validation of bioinformatics methods.


Key Features:

  • Gamma-Multivariate Hypergeometric Model: Uses a Gamma-Multivariate Hypergeometric distribution model to generate count data that mimic real scRNA-seq characteristics.
  • Realistic Count Intensity and Variability: Produces simulated datasets matching empirical count intensity and variability observed in scRNA-seq experiments.
  • Sparsity and Zero Distribution: Captures the distribution of zeros across varying expression intensities to reflect scRNA-seq sparsity.
  • Benchmarking against Splat: Has been compared with the Splat simulator and reported to perform comparably or better in replicating real-data characteristics, particularly zero distribution across expression levels.

Scientific Applications:

  • Method Development: Provides simulated scRNA-seq datasets for testing and refining analytical techniques.
  • Validation Studies: Serves as a benchmark for validating performance of bioinformatics tools and algorithms on controlled count data.
  • Educational Use: Supplies realistic simulated data for training and instructional purposes in single-cell transcriptomics.

Methodology:

Simulation is performed using a Gamma-Multivariate Hypergeometric distribution model with explicit modeling of zero distribution across expression intensities, and performance has been benchmarked against the Splat simulator.

Topics

Details

Programming Languages:
R
Added:
1/9/2020
Last Updated:
1/16/2021

Operations

Publications

Baruzzo G, Patuzzi I, Di Camillo B. SPARSim single cell: a count data simulator for scRNA-seq data. Bioinformatics. 2019;36(5):1468-1475. doi:10.1093/bioinformatics/btz752. PMID:31598633.

Links