Motto

Motto converts position weight matrices (PWMs) into compact wildcard-style consensus sequences that minimize information loss for motif representation and interpretation.


Key Features:

  • Mathematical framework: Utilizes mutual information theory and Jensen-Shannon divergence to formalize conversion from PWMs to consensus sequences.
  • Sequence Motto representation: Produces "sequence Motto" wildcard-style consensus sequences for motifs derived from nucleotides, amino acids, and customized characters.
  • Information-preserving conversion: Minimizes information loss when transforming PWMs into compact consensus sequences for interpretation and motif searching.
  • Alphabet support: Handles motifs from nucleotide alphabets, amino acid alphabets, and user-defined custom characters.
  • Binding site identification: Demonstrated effectiveness in identifying transcription factor binding sites across the human genome.
  • Benchmarking: Evaluated against PWM scanning by FIMO using area under the precision-recall curve (AUPRC) and statistical testing.
  • Comparative performance: Achieved a mean AUPRC of 0.81 and significantly outperformed maximal positional weight, Cavener's method, and minimal mean square error (p < 0.01).

Scientific Applications:

  • Transcription factor binding site identification: Identifying TF binding sites in genomic sequences, demonstrated for 1,156 human transcription factors.
  • Motif interpretation and searching: Generating concise consensus sequences for interpreting motif information and for searching motif matches.
  • Method benchmarking and comparison: Evaluating and comparing motif representation methods using AUPRC and statistical significance tests against FIMO and alternative methods.
  • Sequence analysis in genomics: Producing compact motif summaries for downstream sequence analysis tasks in genomics and bioinformatics.

Methodology:

Conversion of PWMs to consensus sequences using a framework based on mutual information theory and Jensen-Shannon divergence; benchmarking via AUPRC against PWM scanning by FIMO and comparison to maximal positional weight, Cavener's method, and minimal mean square error on 1,156 human TFs.

Topics

Details

Tool Type:
command-line tool
Added:
1/18/2021
Last Updated:
3/1/2021

Operations

Publications

Wang M, Wang D, Zhang K, Ngo V, Fan S, Wang W. Motto: Representing Motifs in Consensus Sequences with Minimum Information Loss. Genetics. 2020;216(2):353-358. doi:10.1534/genetics.120.303597. PMID:32816922. PMCID:PMC7536857.

Links