simuRare

simuRare simulates genomic datasets containing rare and common single nucleotide polymorphisms (SNPs) by integrating logistic regression–based imputation with a resampling approach using 1000 Genomes Project reference data to preserve linkage disequilibrium and minor allele frequency for evaluation of statistical methods and disease penetrance models.


Key Features:

  • Regression-Based Imputation: Uses logistic regression models with 1000 Genomes Project reference data to predict rare variant presence from nearby common variants in SNP array datasets.
  • Resampling Approach: Generates simulated samples via resampling that maintain linkage disequilibrium (LD) and allele frequency distributions observed in reference data.
  • Preservation of Genomic Properties: Retains sample characteristics such as LD and minor allele frequency to produce realistic genetic structures for method evaluation.

Scientific Applications:

  • Evaluate Statistical Methods: Enables benchmarking and evaluation of statistical and bioinformatics methods under realistic genetic architectures containing rare and common SNPs.
  • Study Disease Mechanisms: Allows imposing known disease penetrance models on simulated data to investigate the impact of rare variants on Mendelian and common (complex) diseases.
  • Enhance Computational Efficiency: Combines regression-based imputation with resampling to provide computationally efficient simulation of large-scale genetic datasets while preserving key genomic properties.

Methodology:

Logistic regression models for imputation using 1000 Genomes Project reference data; resampling to generate simulated samples that preserve linkage disequilibrium and minor allele frequency; capability to impose known disease penetrance models.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux, Windows
Programming Languages:
R
Added:
8/3/2017
Last Updated:
11/25/2024

Operations

Publications

Xu Y, Wu Y, Song C, Zhang H. Simulating Realistic Genomic Data With Rare Variants. Genetic Epidemiology. 2012;37(2):163-172. doi:10.1002/gepi.21696. PMID:23161487. PMCID:PMC3543480.

Documentation

Links