BEAGLE

BEAGLE performs genotype calling, haplotype phasing, genotype imputation, and identity-by-descent segment detection to enable analysis of large-scale genomic datasets.


Key Features:

  • Haplotype inference: Scales to thousands of individuals and hundreds of thousands of markers and achieves high accuracy (e.g., ~99% correct imputation of masked alleles).
  • Genotype imputation: Implements the Li and Stephens model and is optimized for large reference panels, enabling imputation with millions of reference samples.
  • Haplotype phasing: Uses marker windowing, composite reference haplotypes, a progressive phasing algorithm, and a two-stage phasing approach for datasets with numerous low-frequency variants.
  • Identity-by-descent detection: Performs detection of identity-by-descent (IBD) segments from genotype data.
  • Scalability and efficiency: Demonstrates throughput improvements in versions 4.1 and 5.0 for very large reference panels (examples cited up to 10 million samples) and reduced computation time on cloud environments such as Amazon Elastic Compute Cloud.
  • Parallelization and memory efficiency: Incorporates parallelization and memory-efficiency optimizations to handle large-scale analyses.
  • Comparative performance: Reported to be faster than methods such as Impute2, Minimac3, and SHAPEIT 4.2.1 for large sequence datasets while maintaining comparable accuracy.
  • Genotype calling: Provides genotype calling capabilities suitable for large-scale SNP array and sequence data.

Scientific Applications:

  • Whole-genome association studies (GWAS): Supports genotype imputation and phasing for genome-wide association analyses.
  • Disease association studies: Enables imputation and phasing needed to detect disease-associated variants.
  • Population genetics: Facilitates haplotype-based analyses and IBD detection for population structure and relatedness studies.
  • Evolutionary biology: Supports analyses of genetic variation and haplotype structure relevant to evolutionary inference.
  • Personalized medicine: Provides imputation and phasing capabilities that can inform genotype-based medical research.

Methodology:

Implements the Li and Stephens model with parallelization and memory-efficiency optimizations; uses marker windowing, composite reference haplotypes, a progressive phasing algorithm and a two-stage phasing approach, and performs identity-by-descent segment detection.

Topics

Collections

Details

License:
GPL-3.0
Tool Type:
command-line tool
Operating Systems:
Linux
Programming Languages:
Java
Added:
8/20/2017
Last Updated:
11/24/2024

Operations

Publications

Browning SR, Browning BL. Rapid and Accurate Haplotype Phasing and Missing-Data Inference for Whole-Genome Association Studies By Use of Localized Haplotype Clustering. The American Journal of Human Genetics. 2007;81(5):1084-1097. doi:10.1086/521987. PMID:17924348. PMCID:PMC2265661.

Browning BL, Browning SR. Genotype Imputation with Millions of Reference Samples. The American Journal of Human Genetics. 2016;98(1):116-126. doi:10.1016/j.ajhg.2015.11.020. PMID:26748515. PMCID:PMC4716681.

Browning BL, Tian X, Zhou Y, Browning SR. Fast two-stage phasing of large-scale sequence data. The American Journal of Human Genetics. 2021;108(10):1880-1890. doi:10.1016/j.ajhg.2021.08.005. PMID:34478634. PMCID:PMC8551421.

PMID: 34478634
PMCID: PMC8551421
Funding: - National Institutes of Health: 19934, 75N92019D00031, HG008359, HHSN268201600034I, HL104608 S1, NO1-HC-25195, R01HL087699, U54HG003067

Browning BL, Zhou Y, Browning SR. A One-Penny Imputed Genome from Next-Generation Reference Panels. The American Journal of Human Genetics. 2018;103(3):338-348. doi:10.1016/j.ajhg.2018.07.015. PMID:30100085. PMCID:PMC6128308.

PMID: 30100085
PMCID: PMC6128308
Funding: - National Institutes of Health: R01HG008359

Documentation