cnCV
"cnCV" (consensus nested cross-validation) is a computational tool to enhance the feature selection process in machine-learning models, mainly to improve accuracy while avoiding overfitting. cnCV innovatively combines differential privacy principles—precisely the idea of feature stability—with nested cross-validation (nCV) to select features more effectively. This method focuses on identifying stable or reliable features across folds as a measure of their utility rather than relying solely on classification accuracy, which is the traditional approach in standard nCV.
Key Features and Functionalities:
- Consensus Feature Selection: cnCV applies feature selection within each inner fold of the nCV process and uses the consensus of top features across all folds to indicate feature stability or reliability. This approach ensures that only the most consistently relevant features are chosen for model training and validation.
- Efficiency and Parsimony: Unlike traditional nCV, cnCV does not require the construction of classifiers in the inner folds, significantly reducing run times. Moreover, cnCV tends to select a more parsimonious set of features, minimizing the inclusion of false positives and streamlining the model.
- Stability Without Privacy Thresholds: cnCV successfully selects stable features between folds without specifying a privacy threshold, an advantage over some differential privacy approaches.
Topic
Machine learning;RNA-Seq;Data security
Detail
Operation: Feature selection;Standardisation and normalisation
Software interface: Library
Language: R
License: GNU General Public License, version
Cost: Free with restrictions
Version name: 0.0.0.9000
Credit: National Institute of Health and William K. Warren Jr. Foundation.
Input: -
Output: -
Contact: Brett A McKinney brett.mckinney@utulsa.edu
Collection: -
Maturity: -
Publications
- Consensus features nested cross-validation.
- Parvandeh S, et al. Consensus features nested cross-validation. Consensus features nested cross-validation. 2020; 36:3093-3098. doi: 10.1093/bioinformatics/btaa046
- https://doi.org/10.1093/BIOINFORMATICS/BTAA046
- PMID: 31985777
- PMC: PMC7776094
Download and documentation
Source: https://github.com/insilico/cncv
Documentation: https://github.com/insilico/cncv/blob/master/README.md
Home page: https://github.com/insilico/cncv
< Back to DB search