binomialRF

"binomialRF" is a feature selection technique for use with Random Forest (RF) classifiers, addressing the challenges of biomarker detection in high-dimensional biological datasets where the number of features (e.g., transcripts) exceeds the number of samples (e.g., patient or animal samples). Recognizing the statistical limitations of traditional techniques in such high-dimensional settings and the need for more interpretable machine learning models for biomarker detection, binomialRF offers an innovative solution that leverages a correlated binomial distribution for feature selection, enabling the efficient analysis of multiway feature interactions.

Key Features and Functionalities:

- Efficient Handling of High-Dimensional Data: binomialRF is specifically designed to overcome the "P >> N" challenge, where the number of features significantly exceeds the number of samples, a common scenario in computational biology.

- Identification of Main Effects and Interactions: The tool focuses on identifying biomarkers' main effects and efficiently detects second and third-order interactions among features, addressing the combinatorial challenge of high-dimensional data.

- Correlated Binomial Distribution for Feature Selection: By employing a correlated binomial distribution, binomialRF provides an alternative interpretation for features, enabling more insightful post-hoc analyses and biomarker detection.

- Computational Efficiency: binomialRF has demonstrated significant computational gains (up to 5 to 300 times faster) in both simulations and validation studies, making it a highly efficient tool for biomarker discovery.

- Competitive Precision and Recall: In clinical studies and datasets from the TCGA and UCI repositories, binomialRF has shown competitive variable precision and recall in identifying biomarkers and their interactions, underscoring its efficacy in classification tasks.

- Pathological Molecular Mechanism Prioritization: The algorithm effectively prioritizes relevant pathological molecular mechanisms, highlighting its potential for uncovering meaningful biological insights through high classification precision and recall.

Topic

Biomarkers;Gene expression;Pathology;Statistics and probability;Molecular interactions, pathways and networks

Detail

  • Operation: Regression analysis;Feature selection;Standardisation and normalisation

  • Software interface: Command-line interface

  • Language: R

  • License: GNU General Public License, version 2

  • Cost: Free with restrictions

  • Version name: 0.0.2

  • Credit: The University of Arizona Health Sciences Center for Biomedical Informatics and Biostatistics, the BIO5 Institute, the NIH.

  • Input: -

  • Output: -

  • Contact: Hao Helen Zhang hzhang@math.arizona.edu ,Yves A. Lussier Lussier.Y@gmail.com

  • Collection: -

  • Maturity: -

Publications

  • binomialRF: interpretable combinatoric efficiency of random forests to identify biomarker interactions.
  • Rachid Zaim S, et al. binomialRF: interpretable combinatoric efficiency of random forests to identify biomarker interactions. binomialRF: interpretable combinatoric efficiency of random forests to identify biomarker interactions. 2020; 21:374. doi: 10.1186/s12859-020-03718-9
  • https://doi.org/10.1186/S12859-020-03718-9
  • PMID: 32859146
  • PMC: PMC7456085

Download and documentation


< Back to DB search