cnCV

"cnCV" (consensus nested cross-validation) is a computational tool to enhance the feature selection process in machine-learning models, mainly to improve accuracy while avoiding overfitting. cnCV innovatively combines differential privacy principles—precisely the idea of feature stability—with nested cross-validation (nCV) to select features more effectively. This method focuses on identifying stable or reliable features across folds as a measure of their utility rather than relying solely on classification accuracy, which is the traditional approach in standard nCV.

Key Features and Functionalities:

- Consensus Feature Selection: cnCV applies feature selection within each inner fold of the nCV process and uses the consensus of top features across all folds to indicate feature stability or reliability. This approach ensures that only the most consistently relevant features are chosen for model training and validation.

- Efficiency and Parsimony: Unlike traditional nCV, cnCV does not require the construction of classifiers in the inner folds, significantly reducing run times. Moreover, cnCV tends to select a more parsimonious set of features, minimizing the inclusion of false positives and streamlining the model.

- Stability Without Privacy Thresholds: cnCV successfully selects stable features between folds without specifying a privacy threshold, an advantage over some differential privacy approaches.

Topic

Machine learning;RNA-Seq;Data security

Detail

  • Operation: Feature selection;Standardisation and normalisation

  • Software interface: Library

  • Language: R

  • License: GNU General Public License, version

  • Cost: Free with restrictions

  • Version name: 0.0.0.9000

  • Credit: National Institute of Health and William K. Warren Jr. Foundation.

  • Input: -

  • Output: -

  • Contact: Brett A McKinney brett.mckinney@utulsa.edu

  • Collection: -

  • Maturity: -

Publications

Download and documentation


< Back to DB search