Ideafix

Ideafix refines variant calls in formalin-fixed and paraffin-embedded (FFPE) DNA sequencing data to reduce FFPE-associated sequencing artefacts and improve accuracy of variant calling for cancer genomics.


Key Features:

  • Target data: Specifically addresses variant calling in formalin-fixed and paraffin-embedded (FFPE) DNA sequencing datasets.
  • Training dataset: Built from a dataset of over 1.6 million variants derived from 27 paired FFPE and fresh-frozen breast cancer samples.
  • Machine learning evaluation: Assessed five machine learning algorithms and identified XGBoost (extreme gradient boosting) and random forest as top performers.
  • Cross-validation performance: Achieved area under the ROC curve (AUC) values greater than 0.86 in leave-one-sample-out cross-validation.
  • Independent validation: Validated on two independent datasets with AUC values up to 0.96, exceeding previously published tools (maximum AUC 0.92).
  • Discriminating features: Leverages read pair orientation bias, genomic context, and variant allele frequency among its variant features.

Scientific Applications:

  • Clinical variant interpretation: Refinement of variants in FFPE-derived sequencing data to support more accurate genomic interpretation for cancer treatment decisions.
  • Molecular testing with FFPE biopsies: Improvement of variant call reliability in routine molecular testing workflows that use FFPE tissue.
  • Precision oncology research: Enabling more reliable use of FFPE samples in research applications requiring accurate somatic variant calls.

Methodology:

Constructed a comprehensive set of variant features from paired FFPE and fresh-frozen samples, evaluated five machine learning algorithms including XGBoost and random forest using leave-one-sample-out cross-validation, and validated performance on two independent datasets while leveraging features such as read pair orientation bias, genomic context, and variant allele frequency.

Topics

Details

License:
GPL-2.0
Cost:
Free of charge
Tool Type:
library
Operating Systems:
Mac, Linux, Windows
Programming Languages:
R
Added:
4/30/2022
Last Updated:
4/30/2022

Operations

Publications

Tellaetxe-Abete M, Calvo B, Lawrie C. Ideafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing data. NAR Genomics and Bioinformatics. 2021;3(4). doi:10.1093/nargab/lqab092. PMID:34729472. PMCID:PMC8557387.

PMID: 34729472
PMCID: PMC8557387
Funding: - Basque Government: PRE_2019_2_0211 - Ministerio de Economía, Industria y Competitividad: PID2019-104966GB-I00 - FEDER: DTS14/00109, PI12/00663, PI15/00275, PI18/01710, PIE13/00048