STEGO

STEGO identifies genetic outliers in whole-genome sequencing (WGS) and genome-wide association study (GWAS) datasets to mitigate confounding from population substructure and cryptic relatedness by leveraging rare variant information.


Key Features:

  • Genetic heterogeneity assessment: Uses rare variants to assess recent ancestry and identify individuals who are genetically too similar (indicative of cryptic relationships) or too different (suggestive of population substructure).
  • Computational efficiency: Designed for application to large-scale datasets such as whole-genome sequencing without prohibitive computational cost.
  • Formal testing framework: Implements a statistical testing framework for systematic assessment and identification of relatedness and substructure within study populations.

Scientific Applications:

  • Simulation validation: Validated in simulation studies showing effectiveness in detecting genetic outliers with moderate sample sizes.
  • Empirical analysis of reference cohorts: Applied to the 1000 Genomes Project dataset to identify likely related subjects that passed standard quality control filters, demonstrating sensitivity to cryptic relatedness in real-world WGS/GWAS data.

Methodology:

Utilizes rare variant data to capture recent ancestral information for detecting subtle genetic similarities and differences among individuals.

Topics

Details

Tool Type:
library
Operating Systems:
Linux, Windows, Mac
Programming Languages:
R
Added:
6/5/2018
Last Updated:
11/25/2024

Operations

Publications

Schlauch D, Fier H, Lange C. Identification of genetic outliers due to sub-structure and cryptic relationships. Bioinformatics. 2017;33(13):1972-1979. doi:10.1093/bioinformatics/btx109. PMID:28334167. PMCID:PMC5870703.

PMID: 28334167
PMCID: PMC5870703
Funding: - National Institutes of Health: 1P01HL105339, T32HL007427

Documentation