binomialRF
"binomialRF" is a feature selection technique for use with Random Forest (RF) classifiers, addressing the challenges of biomarker detection in high-dimensional biological datasets where the number of features (e.g., transcripts) exceeds the number of samples (e.g., patient or animal samples). Recognizing the statistical limitations of traditional techniques in such high-dimensional settings and the need for more interpretable machine learning models for biomarker detection, binomialRF offers an innovative solution that leverages a correlated binomial distribution for feature selection, enabling the efficient analysis of multiway feature interactions.
Key Features and Functionalities:
- Efficient Handling of High-Dimensional Data: binomialRF is specifically designed to overcome the "P >> N" challenge, where the number of features significantly exceeds the number of samples, a common scenario in computational biology.
- Identification of Main Effects and Interactions: The tool focuses on identifying biomarkers' main effects and efficiently detects second and third-order interactions among features, addressing the combinatorial challenge of high-dimensional data.
- Correlated Binomial Distribution for Feature Selection: By employing a correlated binomial distribution, binomialRF provides an alternative interpretation for features, enabling more insightful post-hoc analyses and biomarker detection.
- Computational Efficiency: binomialRF has demonstrated significant computational gains (up to 5 to 300 times faster) in both simulations and validation studies, making it a highly efficient tool for biomarker discovery.
- Competitive Precision and Recall: In clinical studies and datasets from the TCGA and UCI repositories, binomialRF has shown competitive variable precision and recall in identifying biomarkers and their interactions, underscoring its efficacy in classification tasks.
- Pathological Molecular Mechanism Prioritization: The algorithm effectively prioritizes relevant pathological molecular mechanisms, highlighting its potential for uncovering meaningful biological insights through high classification precision and recall.
Topic
Biomarkers;Gene expression;Pathology;Statistics and probability;Molecular interactions, pathways and networks
Detail
Operation: Regression analysis;Feature selection;Standardisation and normalisation
Software interface: Command-line interface
Language: R
License: GNU General Public License, version 2
Cost: Free with restrictions
Version name: 0.0.2
Credit: The University of Arizona Health Sciences Center for Biomedical Informatics and Biostatistics, the BIO5 Institute, the NIH.
Input: -
Output: -
Contact: Hao Helen Zhang hzhang@math.arizona.edu ,Yves A. Lussier Lussier.Y@gmail.com
Collection: -
Maturity: -
Publications
- binomialRF: interpretable combinatoric efficiency of random forests to identify biomarker interactions.
- Rachid Zaim S, et al. binomialRF: interpretable combinatoric efficiency of random forests to identify biomarker interactions. binomialRF: interpretable combinatoric efficiency of random forests to identify biomarker interactions. 2020; 21:374. doi: 10.1186/s12859-020-03718-9
- https://doi.org/10.1186/S12859-020-03718-9
- PMID: 32859146
- PMC: PMC7456085
Download and documentation
Documentation: https://github.com/SamirRachidZaim/binomialRF/blob/master/README.md
Links: https://github.com/SamirRachidZaim/binomialRF/tree/master/vignettes
< Back to DB search