GkmExplain

GkmExplain is a computationally efficient feature attribution method for interpreting predictive sequence patterns learned by gapped k-mer Support Vector Machine (gkm-SVM) models of regulatory DNA. It addresses limitations of existing interpretation methods like deltaSVM, in-silico mutagenesis (ISM), and SHAP, which either need to scale better or make assumptions that can lead to misleading results when gkm kernels are combined with nonlinear kernels. GkmExplain has theoretical connections to the Integrated Gradients method.

Using simulated regulatory DNA sequences, GkmExplain identifies predictive patterns with high accuracy while being more computationally efficient than SHAP and avoiding issues of deltaSVM and ISM. It recovers consolidated, non-redundant TF motifs when applied with the motif discovery method TF-MoDISco to gkm-SVM models trained on in vivo transcription factor binding data. Mutation impact scores from GkmExplain outperform those from deltaSVM and ISM at identifying regulatory genetic variants in gkm-SVM models of chromatin accessibility.

Topic

Sequencing;Transcription factors and regulatory sites;Machine learning

Detail

  • Operation: Sequence motif discovery;Enrichment analysis;k-mer counting

  • Software interface: Command-line interface

  • Language: Python

  • License: Not stated

  • Cost: Free of charge

  • Version name: -

  • Credit: HHMI International Student Research, Bio-X Bowes, National Institute of Health.

  • Input: -

  • Output: -

  • Contact: Avanti Shrikumar avanti@stanford.edu ,Anshul Kundaje akundaje@stanford.edu

  • Collection: -

  • Maturity: -

Publications

  • GkmExplain: fast and accurate interpretation of nonlinear gapped k-mer SVMs.
  • Shrikumar A, et al. GkmExplain: fast and accurate interpretation of nonlinear gapped k-mer SVMs. GkmExplain: fast and accurate interpretation of nonlinear gapped k-mer SVMs. 2019; 35:i173-i182. doi: 10.1093/bioinformatics/btz322
  • https://doi.org/10.1093/BIOINFORMATICS/BTZ322
  • PMID: 31510661
  • PMC: PMC6612808

Download and documentation


< Back to DB search