SMFM
SMFM identifies and characterizes DNA enhancers by integrating a stacked multivariate fusion of multi-source biological and dynamic semantic data with deep learning sequence networks and an ensemble classifier to detect regulatory enhancers genome-wide.
Key Features:
- Stacked multivariate fusion: Integrates heterogeneous biological information and dynamic semantic data into comprehensive sequence representations.
- Multi-source biological integration: Combines diverse biological inputs to enrich enhancer sequence feature sets.
- Dynamic semantic information: Incorporates semantic representations of gene-related information to capture implicit relationships.
- Deep learning-based sequence networks: Learns intricate feature representations from enhancer sequences using deep neural network models.
- Ensemble machine learning classifier: Uses refined features and implicit relations derived from deep models to classify enhancers.
- Motif analysis via contribution scores: Evaluates per-base contribution scores within enhancer sequences to support motif and functional analysis.
- Interpretability with EnhancerBERT: Applies EnhancerBERT, a fine-tuned BERT model, to elucidate gene semantic information associated with enhancers.
- Benchmarking and independent validation: Demonstrates superior performance against existing methods and validated generalization on an independent test set.
- Placenta application: Applied to a human placenta cohort of 4,562 active distal gene regulatory enhancers to reveal tissue-specific developmental and differential mechanisms.
Scientific Applications:
- Genome-wide enhancer identification: Detection and characterization of DNA enhancers across human cell lines and the genome.
- Functional motif discovery: Identification of functionally relevant bases within enhancers through contribution score analysis.
- Gene semantic interpretation: Linking enhancers to gene-related semantic information using a fine-tuned BERT model.
- Tissue-specific regulatory analysis: Investigation of tissue-specific developmental processes and differential mechanisms, exemplified in human placenta data.
- Method comparison and validation: Comparative benchmarking and validation of enhancer prediction performance against state-of-the-art methods.
Methodology:
Computational methods explicitly include a stacked multivariate fusion approach to integrate multi-source biological and dynamic semantic data, deep learning-based sequence networks to learn feature representations, an ensemble machine learning classifier using refined features and implicit relations, motif analysis via per-base contribution scores, and interpretability analyses using EnhancerBERT (a fine-tuned BERT model); benchmarking was performed against existing methods with independent test set validation.
Topics
Details
- License:
- Not licensed
- Cost:
- Free of charge
- Tool Type:
- command-line tool
- Operating Systems:
- Mac, Linux, Windows
- Programming Languages:
- Python
- Added:
- 2/13/2023
- Last Updated:
- 11/24/2024
Operations
Publications
Wang Y, Hou Z, Yang Y, Wong K, Li X. Genome-wide identification and characterization of DNA enhancers with a stacked multivariate fusion framework. PLOS Computational Biology. 2022;18(12):e1010779. doi:10.1371/journal.pcbi.1010779. PMID:36520922. PMCID:PMC9836277.