fastbaps
fastbaps implements a rapid approximation to a Dirichlet Process Mixture (DPM) model to cluster multilocus genotype data for population-genomic and phylogenetic analyses.
Key Features:
- Dirichlet Process Mixture approximation: Provides a rapid approximation to a DPM model specifically tailored for clustering multilocus genotype data.
- Scalability and speed: Scales to very large datasets and reports speed improvements on the order of 10–100 times faster than prior model-based methods.
- Hierarchical partitioning: Rapidly partitions existing hierarchies to maximize the DPM model marginal likelihood for splitting phylogenetic trees into clades and subclades using population genomic models.
- Empirical demonstration: Applied to datasets including over 110,000 sequences of HIV-1 pol genes to demonstrate handling of large-scale sequence data.
- Performance validation: Tested on simulated data and real-world bacterial and viral datasets, producing solutions comparable or superior to previous methods.
Scientific Applications:
- Population genomics: Clusters multilocus genotype data to identify subpopulations and population structure within species.
- Phylogenetics: Splits phylogenetic trees into clades and subclades to aid analysis of evolutionary relationships.
- Pathogen diversity analysis: Characterizes genetic diversity and substructure in complex viral and bacterial populations.
Methodology:
Performs a rapid approximation to a Dirichlet Process Mixture (DPM) model and applies hierarchical partitioning to maximize the DPM marginal likelihood for splitting phylogenetic trees into clades and subclades using population genomic models.
Topics
Details
- License:
- MIT
- Maturity:
- Mature
- Cost:
- Free of charge
- Tool Type:
- library
- Operating Systems:
- Linux, Windows, Mac
- Programming Languages:
- R, C++, C
- Added:
- 8/9/2019
- Last Updated:
- 6/16/2020
Operations
Publications
Tonkin-Hill G, Lees JA, Bentley SD, Frost SDW, Corander J. Fast hierarchical Bayesian analysis of population structure. Nucleic Acids Research. 2019;47(11):5539-5549. doi:10.1093/nar/gkz361. PMID:31076776. PMCID:PMC6582336.
DOI: 10.1093/NAR/GKZ361
PMID: 31076776
PMCID: PMC6582336
Funding: - Wellcome Trust: 204016, 206194
- ERC: 742158
- Alan Turing Institute: EP/510129/1
- National Institutes of Health: R01AI135970
Documentation
Downloads
Links
Issue tracker
https://github.com/gtonkinhill/fastbaps/issues