cnCV

"cnCV" (consensus nested cross-validation) is a computational tool to enhance the feature selection process in machine-learning models, mainly to improve accuracy while avoiding overfitting. cnCV innovatively combines differential privacy principles—precisely the idea of feature stability—with nested cross-validation (nCV) to select features more effectively. This method focuses on identifying stable or reliable features across folds as a measure of their utility rather than relying solely on classification accuracy, which is the traditional approach in standard nCV.

Key Features and Functionalities:

- Consensus Feature Selection: cnCV applies feature selection within each inner fold of the nCV process and uses the consensus of top features across all folds to indicate feature stability or reliability. This approach ensures that only the most consistently relevant features are chosen for model training and validation.

- Efficiency and Parsimony: Unlike traditional nCV, cnCV does not require the construction of classifiers in the inner folds, significantly reducing run times. Moreover, cnCV tends to select a more parsimonious set of features, minimizing the inclusion of false positives and streamlining the model.

- Stability Without Privacy Thresholds: cnCV successfully selects stable features between folds without specifying a privacy threshold, an advantage over some differential privacy approaches.

Topic

Machine learning;RNA-Seq;Data security

Detail

Operation: Feature selection;Standardisation and normalisation
Software interface: Library
Language: R
License: GNU General Public License, version
Cost: Free with restrictions
Version name: 0.0.0.9000
Credit: National Institute of Health and William K. Warren Jr. Foundation.
Input: -
Output: -
Contact: Brett A McKinney brett.mckinney@utulsa.edu
Collection: -
Maturity: -

Publications

Consensus features nested cross-validation.
Parvandeh S, et al. Consensus features nested cross-validation. Consensus features nested cross-validation. 2020; 36:3093-3098. doi: 10.1093/bioinformatics/btaa046
https://doi.org/10.1093/BIOINFORMATICS/BTAA046
PMID: 31985777
PMC: PMC7776094

Download and documentation

Source: https://github.com/insilico/cncv
Documentation: https://github.com/insilico/cncv/blob/master/README.md
Home page: https://github.com/insilico/cncv

< Back to DB search