BGT
BGT encodes and queries whole-genome genotypes and haplotypes for large cohorts, enabling compact storage and high-throughput retrieval for genomic analyses.
Key Features:
- Compact file format: Encodes haplotypes and genotypes into a highly optimized binary representation that can store the haplotypes of 32,488 samples across 39.2 million SNPs in 7.4 GB.
- High-throughput decoding: Achieves decoding rates up to 420 million genotypes per CPU second.
- Real-time querying: Supports real-time responses to complex genotype queries across large datasets.
- Scalability for large cohorts: Designed to store and query whole-genome genotypes for tens to hundreds of thousands of samples.
Scientific Applications:
- Population genetics: Enables storage and querying of large-scale genotype and haplotype data for population structure and diversity analyses.
- Genome-wide association studies (GWAS): Supports retrieval of genotype data at scale for association testing across many samples and variants.
- Genotype frequency analyses: Facilitates computation and inspection of genotype and allele frequency distributions across large cohorts.
Methodology:
BGT encodes genomic data into a compact binary format and uses optimized algorithms for storage and fast retrieval/decoding (reported decoding rate up to 420 million genotypes per CPU second).
Topics
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Linux
- Added:
- 8/3/2017
- Last Updated:
- 11/25/2024
Operations
Publications
Li H. BGT: efficient and flexible genotype query across many samples. Bioinformatics. 2015;32(4):590-592. doi:10.1093/bioinformatics/btv613. PMID:26500154. PMCID:PMC5963361.
Documentation
General
https://github.com/lh3/bgtLinks
Software catalogue
http://www.mybiosoftware.com/bgt-genotype-query-across-many-samples.html