Motto
Motto converts position weight matrices (PWMs) into compact wildcard-style consensus sequences that minimize information loss for motif representation and interpretation.
Key Features:
- Mathematical framework: Utilizes mutual information theory and Jensen-Shannon divergence to formalize conversion from PWMs to consensus sequences.
- Sequence Motto representation: Produces "sequence Motto" wildcard-style consensus sequences for motifs derived from nucleotides, amino acids, and customized characters.
- Information-preserving conversion: Minimizes information loss when transforming PWMs into compact consensus sequences for interpretation and motif searching.
- Alphabet support: Handles motifs from nucleotide alphabets, amino acid alphabets, and user-defined custom characters.
- Binding site identification: Demonstrated effectiveness in identifying transcription factor binding sites across the human genome.
- Benchmarking: Evaluated against PWM scanning by FIMO using area under the precision-recall curve (AUPRC) and statistical testing.
- Comparative performance: Achieved a mean AUPRC of 0.81 and significantly outperformed maximal positional weight, Cavener's method, and minimal mean square error (p < 0.01).
Scientific Applications:
- Transcription factor binding site identification: Identifying TF binding sites in genomic sequences, demonstrated for 1,156 human transcription factors.
- Motif interpretation and searching: Generating concise consensus sequences for interpreting motif information and for searching motif matches.
- Method benchmarking and comparison: Evaluating and comparing motif representation methods using AUPRC and statistical significance tests against FIMO and alternative methods.
- Sequence analysis in genomics: Producing compact motif summaries for downstream sequence analysis tasks in genomics and bioinformatics.
Methodology:
Conversion of PWMs to consensus sequences using a framework based on mutual information theory and Jensen-Shannon divergence; benchmarking via AUPRC against PWM scanning by FIMO and comparison to maximal positional weight, Cavener's method, and minimal mean square error on 1,156 human TFs.
Topics
Details
- Tool Type:
- command-line tool
- Added:
- 1/18/2021
- Last Updated:
- 3/1/2021
Operations
Publications
Wang M, Wang D, Zhang K, Ngo V, Fan S, Wang W. Motto: Representing Motifs in Consensus Sequences with Minimum Information Loss. Genetics. 2020;216(2):353-358. doi:10.1534/genetics.120.303597. PMID:32816922. PMCID:PMC7536857.
Links
Repository
https://github.com/MichaelMW/motto