SMFM

SMFM identifies and characterizes DNA enhancers by integrating a stacked multivariate fusion of multi-source biological and dynamic semantic data with deep learning sequence networks and an ensemble classifier to detect regulatory enhancers genome-wide.


Key Features:

  • Stacked multivariate fusion: Integrates heterogeneous biological information and dynamic semantic data into comprehensive sequence representations.
  • Multi-source biological integration: Combines diverse biological inputs to enrich enhancer sequence feature sets.
  • Dynamic semantic information: Incorporates semantic representations of gene-related information to capture implicit relationships.
  • Deep learning-based sequence networks: Learns intricate feature representations from enhancer sequences using deep neural network models.
  • Ensemble machine learning classifier: Uses refined features and implicit relations derived from deep models to classify enhancers.
  • Motif analysis via contribution scores: Evaluates per-base contribution scores within enhancer sequences to support motif and functional analysis.
  • Interpretability with EnhancerBERT: Applies EnhancerBERT, a fine-tuned BERT model, to elucidate gene semantic information associated with enhancers.
  • Benchmarking and independent validation: Demonstrates superior performance against existing methods and validated generalization on an independent test set.
  • Placenta application: Applied to a human placenta cohort of 4,562 active distal gene regulatory enhancers to reveal tissue-specific developmental and differential mechanisms.

Scientific Applications:

  • Genome-wide enhancer identification: Detection and characterization of DNA enhancers across human cell lines and the genome.
  • Functional motif discovery: Identification of functionally relevant bases within enhancers through contribution score analysis.
  • Gene semantic interpretation: Linking enhancers to gene-related semantic information using a fine-tuned BERT model.
  • Tissue-specific regulatory analysis: Investigation of tissue-specific developmental processes and differential mechanisms, exemplified in human placenta data.
  • Method comparison and validation: Comparative benchmarking and validation of enhancer prediction performance against state-of-the-art methods.

Methodology:

Computational methods explicitly include a stacked multivariate fusion approach to integrate multi-source biological and dynamic semantic data, deep learning-based sequence networks to learn feature representations, an ensemble machine learning classifier using refined features and implicit relations, motif analysis via per-base contribution scores, and interpretability analyses using EnhancerBERT (a fine-tuned BERT model); benchmarking was performed against existing methods with independent test set validation.

Topics

Details

License:
Not licensed
Cost:
Free of charge
Tool Type:
command-line tool
Operating Systems:
Mac, Linux, Windows
Programming Languages:
Python
Added:
2/13/2023
Last Updated:
11/24/2024

Operations

Publications

Wang Y, Hou Z, Yang Y, Wong K, Li X. Genome-wide identification and characterization of DNA enhancers with a stacked multivariate fusion framework. PLOS Computational Biology. 2022;18(12):e1010779. doi:10.1371/journal.pcbi.1010779. PMID:36520922. PMCID:PMC9836277.

PMID: 36520922
PMCID: PMC9836277
Funding: - National Natural Science Foundation of China: 62076109