BGT

BGT encodes and queries whole-genome genotypes and haplotypes for large cohorts, enabling compact storage and high-throughput retrieval for genomic analyses.


Key Features:

  • Compact file format: Encodes haplotypes and genotypes into a highly optimized binary representation that can store the haplotypes of 32,488 samples across 39.2 million SNPs in 7.4 GB.
  • High-throughput decoding: Achieves decoding rates up to 420 million genotypes per CPU second.
  • Real-time querying: Supports real-time responses to complex genotype queries across large datasets.
  • Scalability for large cohorts: Designed to store and query whole-genome genotypes for tens to hundreds of thousands of samples.

Scientific Applications:

  • Population genetics: Enables storage and querying of large-scale genotype and haplotype data for population structure and diversity analyses.
  • Genome-wide association studies (GWAS): Supports retrieval of genotype data at scale for association testing across many samples and variants.
  • Genotype frequency analyses: Facilitates computation and inspection of genotype and allele frequency distributions across large cohorts.

Methodology:

BGT encodes genomic data into a compact binary format and uses optimized algorithms for storage and fast retrieval/decoding (reported decoding rate up to 420 million genotypes per CPU second).

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux
Added:
8/3/2017
Last Updated:
11/25/2024

Operations

Publications

Li H. BGT: efficient and flexible genotype query across many samples. Bioinformatics. 2015;32(4):590-592. doi:10.1093/bioinformatics/btv613. PMID:26500154. PMCID:PMC5963361.

PMID: 26500154
PMCID: PMC5963361
Funding: - NHGRI: U54HG003037 - NIH: GM100233

Documentation

Links