ALDEx2
ALDEx2 performs differential abundance analysis of high-throughput sequencing count data by modeling compositionality with a Dirichlet-multinomial framework.
Key Features:
- Dirichlet-multinomial model: Transforms raw counts into relative abundances to handle compositional data and account for technical and statistical variation.
- Statistical tests: Implements Wilcox rank test, Welch's t-test, generalized linear models (GLM), and Kruskal-Wallis test for inference of differential abundance.
- False Discovery Rate (FDR) control: Calculates expected false discovery rate and reports P-values and FDR values adjusted using the Benjamini-Hochberg correction, considering biological and sampling variation.
- Bayesian inference: Uses Bayesian methods to distinguish technical noise from true biological signal.
- Replicate optimization: Model optimized for experiments with three or more replicates.
Scientific Applications:
- RNA Sequencing (RNA-seq): Identifies differentially expressed genes from count-based RNA-seq data.
- 16S rRNA Gene Sequencing: Analyzes microbial community taxon abundances, including distinguishing taxa between tongue dorsum and buccal mucosa in human microbiome studies.
- Chromatin Immunoprecipitation Sequencing (ChIP-seq): Applicable to epigenetic studies of DNA–protein interactions using count data.
- Metagenomic Analysis: Facilitates differential abundance analysis of microbial communities from environmental or clinical metagenomic samples.
- Selective Growth Experiments: Assesses differential growth patterns in vitro using count-based measurements.
- Human Microbiome Project 16S data: Has been applied to Human Microbiome Project 16S rRNA gene abundance datasets.
Methodology:
Applies a Dirichlet-multinomial framework to transform raw counts into relative abundances, uses Bayesian inference to partition technical noise and biological signal, performs statistical tests (Wilcox rank test, Welch's t-test, GLM, Kruskal-Wallis), and computes P-values and FDR values with Benjamini-Hochberg adjustment while estimating expected false discovery rate accounting for biological and sampling variation; optimized for experiments with three or more replicates.
Topics
Collections
Details
- Tool Type:
- command-line tool, library
- Operating Systems:
- Linux, Windows, Mac
- Programming Languages:
- R
- Added:
- 1/17/2017
- Last Updated:
- 11/24/2024
Operations
Data Inputs & Outputs
Statistical inference
Publications
Fernandes AD, Reid JN, Macklaim JM, McMurrough TA, Edgell DR, Gloor GB. Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis. Microbiome. 2014;2(1). doi:10.1186/2049-2618-2-15. PMID:24910773. PMCID:PMC4030730.