SNPRelate
SNPRelate performs high-performance computation on SNP datasets for GWAS by enabling principal component analysis and identity-by-descent relatedness analysis on large-scale genomic data.
Key Features:
- Optimized Data Format: Uses the CoreArray Genomic Data Structure (GDS) binary format with a two-bit representation per SNP for compact storage and fast I/O.
- Accelerated Computations: Implements highly optimized algorithms in C/C++ to perform principal component analysis (PCA) and identity-by-descent (IBD) relatedness analysis efficiently.
- Multi-Core Processing: Exploits multi-core symmetric multiprocessing with reported uniprocessor PCA and IBD speedups of ~8–50× versus EIGENSTRAT (v3.0) and PLINK (v1.07), and up to ~30–300× using eight cores.
- Scalability: Scales to tens of thousands of samples and millions of SNPs, demonstrated by PCA on 55,324 subjects from the "Gene-Environment Association Studies" consortium.
Scientific Applications:
- Genome-wide association studies (GWAS): Enables large-scale SNP data analysis to identify genetic variants associated with diseases and traits.
- Population stratification correction: Provides PCA to detect and correct population structure in GWAS analyses.
- Relatedness and cryptic relatedness detection: Uses identity-by-descent measures to assess pairwise relatedness among samples.
- Large-scale population genomics: Supports analysis of cohorts comprising tens of thousands of individuals with millions of variants.
Methodology:
Uses the CoreArray GDS binary format with two-bit SNP encoding, optimized C/C++ algorithms and integer operations with minimal bit usage, and multi-core symmetric multiprocessing to perform PCA and identity-by-descent computations.
Topics
Collections
Details
- License:
- GPL-3.0
- Tool Type:
- command-line tool, library
- Operating Systems:
- Linux, Windows, Mac
- Programming Languages:
- R
- Added:
- 1/17/2017
- Last Updated:
- 1/9/2019
Operations
Data Inputs & Outputs
Genetic variation analysis
Publications
Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics. 2012;28(24):3326-3328. doi:10.1093/bioinformatics/bts606. PMID:23060615. PMCID:PMC3519454.