simuRare
simuRare simulates genomic datasets containing rare and common single nucleotide polymorphisms (SNPs) by integrating logistic regression–based imputation with a resampling approach using 1000 Genomes Project reference data to preserve linkage disequilibrium and minor allele frequency for evaluation of statistical methods and disease penetrance models.
Key Features:
- Regression-Based Imputation: Uses logistic regression models with 1000 Genomes Project reference data to predict rare variant presence from nearby common variants in SNP array datasets.
- Resampling Approach: Generates simulated samples via resampling that maintain linkage disequilibrium (LD) and allele frequency distributions observed in reference data.
- Preservation of Genomic Properties: Retains sample characteristics such as LD and minor allele frequency to produce realistic genetic structures for method evaluation.
Scientific Applications:
- Evaluate Statistical Methods: Enables benchmarking and evaluation of statistical and bioinformatics methods under realistic genetic architectures containing rare and common SNPs.
- Study Disease Mechanisms: Allows imposing known disease penetrance models on simulated data to investigate the impact of rare variants on Mendelian and common (complex) diseases.
- Enhance Computational Efficiency: Combines regression-based imputation with resampling to provide computationally efficient simulation of large-scale genetic datasets while preserving key genomic properties.
Methodology:
Logistic regression models for imputation using 1000 Genomes Project reference data; resampling to generate simulated samples that preserve linkage disequilibrium and minor allele frequency; capability to impose known disease penetrance models.
Topics
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Linux, Windows
- Programming Languages:
- R
- Added:
- 8/3/2017
- Last Updated:
- 11/25/2024
Operations
Publications
Xu Y, Wu Y, Song C, Zhang H. Simulating Realistic Genomic Data With Rare Variants. Genetic Epidemiology. 2012;37(2):163-172. doi:10.1002/gepi.21696. PMID:23161487. PMCID:PMC3543480.