scikit-activeml
scikit-activeml implements active learning algorithms to select informative samples for labeling and improve model performance in data-scarce bioinformatics tasks such as genomics and proteomics.
Key Features:
- Foundation: Built on the SciPy and scikit-learn frameworks.
- Integration with Scikit-Learn: Integrates with scikit-learn classifiers, including Logistic Regression, to apply active learning to existing models.
- Handling Unlabeled Data: Represents unlabeled instances by a designated MISSING_LABEL value in the label vector y_true.
- Query Strategies: Implements a range of query strategies to identify the most informative data points for labeling.
- Customizable Active Learning Cycles: Supports iterative active learning cycles configurable with different classifiers.
Scientific Applications:
- Genomics: Reduce labeling effort for genomic datasets where experimental validation is costly by prioritizing informative samples.
- Proteomics: Prioritize labels in proteomic datasets to improve model accuracy with fewer labeled instances.
- Large-scale biological data analysis: Improve model performance under limited labeling resources in other large-scale bioinformatics analyses.
Methodology:
Uses uncertainty sampling to iteratively select and query labels for samples about which the model is least certain; employs active learning cycles, the MISSING_LABEL convention in y_true to mark unlabeled instances, and scikit-learn classifiers (e.g., Logistic Regression).
Topics
Details
- License:
- BSD-3-Clause
- Tool Type:
- library
- Programming Languages:
- Python
- Added:
- 11/29/2021
- Last Updated:
- 11/29/2021
Operations
Publications
Kottke D, Herde M, Pham Minh T, Benz A, Mergard P, Roghman A, Sandrock C, Sick B. scikit-activeml: A Library and Toolbox for Active Learning Algorithms. Unknown Journal. 2021. doi:10.20944/preprints202103.0194.v1.
Documentation
Links
Issue tracker
https://github.com/scikit-activeml/scikit-activeml