fastbaps

fastbaps implements a rapid approximation to a Dirichlet Process Mixture (DPM) model to cluster multilocus genotype data for population-genomic and phylogenetic analyses.


Key Features:

  • Dirichlet Process Mixture approximation: Provides a rapid approximation to a DPM model specifically tailored for clustering multilocus genotype data.
  • Scalability and speed: Scales to very large datasets and reports speed improvements on the order of 10–100 times faster than prior model-based methods.
  • Hierarchical partitioning: Rapidly partitions existing hierarchies to maximize the DPM model marginal likelihood for splitting phylogenetic trees into clades and subclades using population genomic models.
  • Empirical demonstration: Applied to datasets including over 110,000 sequences of HIV-1 pol genes to demonstrate handling of large-scale sequence data.
  • Performance validation: Tested on simulated data and real-world bacterial and viral datasets, producing solutions comparable or superior to previous methods.

Scientific Applications:

  • Population genomics: Clusters multilocus genotype data to identify subpopulations and population structure within species.
  • Phylogenetics: Splits phylogenetic trees into clades and subclades to aid analysis of evolutionary relationships.
  • Pathogen diversity analysis: Characterizes genetic diversity and substructure in complex viral and bacterial populations.

Methodology:

Performs a rapid approximation to a Dirichlet Process Mixture (DPM) model and applies hierarchical partitioning to maximize the DPM marginal likelihood for splitting phylogenetic trees into clades and subclades using population genomic models.

Topics

Details

License:
MIT
Maturity:
Mature
Cost:
Free of charge
Tool Type:
library
Operating Systems:
Linux, Windows, Mac
Programming Languages:
R, C++, C
Added:
8/9/2019
Last Updated:
6/16/2020

Operations

Publications

Tonkin-Hill G, Lees JA, Bentley SD, Frost SDW, Corander J. Fast hierarchical Bayesian analysis of population structure. Nucleic Acids Research. 2019;47(11):5539-5549. doi:10.1093/nar/gkz361. PMID:31076776. PMCID:PMC6582336.

PMID: 31076776
PMCID: PMC6582336
Funding: - Wellcome Trust: 204016, 206194 - ERC: 742158 - Alan Turing Institute: EP/510129/1 - National Institutes of Health: R01AI135970

Documentation

Downloads

Links