Ideafix
Ideafix refines variant calls in formalin-fixed and paraffin-embedded (FFPE) DNA sequencing data to reduce FFPE-associated sequencing artefacts and improve accuracy of variant calling for cancer genomics.
Key Features:
- Target data: Specifically addresses variant calling in formalin-fixed and paraffin-embedded (FFPE) DNA sequencing datasets.
- Training dataset: Built from a dataset of over 1.6 million variants derived from 27 paired FFPE and fresh-frozen breast cancer samples.
- Machine learning evaluation: Assessed five machine learning algorithms and identified XGBoost (extreme gradient boosting) and random forest as top performers.
- Cross-validation performance: Achieved area under the ROC curve (AUC) values greater than 0.86 in leave-one-sample-out cross-validation.
- Independent validation: Validated on two independent datasets with AUC values up to 0.96, exceeding previously published tools (maximum AUC 0.92).
- Discriminating features: Leverages read pair orientation bias, genomic context, and variant allele frequency among its variant features.
Scientific Applications:
- Clinical variant interpretation: Refinement of variants in FFPE-derived sequencing data to support more accurate genomic interpretation for cancer treatment decisions.
- Molecular testing with FFPE biopsies: Improvement of variant call reliability in routine molecular testing workflows that use FFPE tissue.
- Precision oncology research: Enabling more reliable use of FFPE samples in research applications requiring accurate somatic variant calls.
Methodology:
Constructed a comprehensive set of variant features from paired FFPE and fresh-frozen samples, evaluated five machine learning algorithms including XGBoost and random forest using leave-one-sample-out cross-validation, and validated performance on two independent datasets while leveraging features such as read pair orientation bias, genomic context, and variant allele frequency.
Topics
Details
- License:
- GPL-2.0
- Cost:
- Free of charge
- Tool Type:
- library
- Operating Systems:
- Mac, Linux, Windows
- Programming Languages:
- R
- Added:
- 4/30/2022
- Last Updated:
- 4/30/2022
Operations
Publications
Tellaetxe-Abete M, Calvo B, Lawrie C. Ideafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing data. NAR Genomics and Bioinformatics. 2021;3(4). doi:10.1093/nargab/lqab092. PMID:34729472. PMCID:PMC8557387.