SNPRelate

SNPRelate performs high-performance computation on SNP datasets for GWAS by enabling principal component analysis and identity-by-descent relatedness analysis on large-scale genomic data.


Key Features:

  • Optimized Data Format: Uses the CoreArray Genomic Data Structure (GDS) binary format with a two-bit representation per SNP for compact storage and fast I/O.
  • Accelerated Computations: Implements highly optimized algorithms in C/C++ to perform principal component analysis (PCA) and identity-by-descent (IBD) relatedness analysis efficiently.
  • Multi-Core Processing: Exploits multi-core symmetric multiprocessing with reported uniprocessor PCA and IBD speedups of ~8–50× versus EIGENSTRAT (v3.0) and PLINK (v1.07), and up to ~30–300× using eight cores.
  • Scalability: Scales to tens of thousands of samples and millions of SNPs, demonstrated by PCA on 55,324 subjects from the "Gene-Environment Association Studies" consortium.

Scientific Applications:

  • Genome-wide association studies (GWAS): Enables large-scale SNP data analysis to identify genetic variants associated with diseases and traits.
  • Population stratification correction: Provides PCA to detect and correct population structure in GWAS analyses.
  • Relatedness and cryptic relatedness detection: Uses identity-by-descent measures to assess pairwise relatedness among samples.
  • Large-scale population genomics: Supports analysis of cohorts comprising tens of thousands of individuals with millions of variants.

Methodology:

Uses the CoreArray GDS binary format with two-bit SNP encoding, optimized C/C++ algorithms and integer operations with minimal bit usage, and multi-core symmetric multiprocessing to perform PCA and identity-by-descent computations.

Topics

Collections

Details

License:
GPL-3.0
Tool Type:
command-line tool, library
Operating Systems:
Linux, Windows, Mac
Programming Languages:
R
Added:
1/17/2017
Last Updated:
1/9/2019

Operations

Data Inputs & Outputs

Publications

Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics. 2012;28(24):3326-3328. doi:10.1093/bioinformatics/bts606. PMID:23060615. PMCID:PMC3519454.

Documentation

Downloads

Links