STEGO
STEGO identifies genetic outliers in whole-genome sequencing (WGS) and genome-wide association study (GWAS) datasets to mitigate confounding from population substructure and cryptic relatedness by leveraging rare variant information.
Key Features:
- Genetic heterogeneity assessment: Uses rare variants to assess recent ancestry and identify individuals who are genetically too similar (indicative of cryptic relationships) or too different (suggestive of population substructure).
- Computational efficiency: Designed for application to large-scale datasets such as whole-genome sequencing without prohibitive computational cost.
- Formal testing framework: Implements a statistical testing framework for systematic assessment and identification of relatedness and substructure within study populations.
Scientific Applications:
- Simulation validation: Validated in simulation studies showing effectiveness in detecting genetic outliers with moderate sample sizes.
- Empirical analysis of reference cohorts: Applied to the 1000 Genomes Project dataset to identify likely related subjects that passed standard quality control filters, demonstrating sensitivity to cryptic relatedness in real-world WGS/GWAS data.
Methodology:
Utilizes rare variant data to capture recent ancestral information for detecting subtle genetic similarities and differences among individuals.
Topics
Details
- Tool Type:
- library
- Operating Systems:
- Linux, Windows, Mac
- Programming Languages:
- R
- Added:
- 6/5/2018
- Last Updated:
- 11/25/2024
Operations
Publications
Schlauch D, Fier H, Lange C. Identification of genetic outliers due to sub-structure and cryptic relationships. Bioinformatics. 2017;33(13):1972-1979. doi:10.1093/bioinformatics/btx109. PMID:28334167. PMCID:PMC5870703.