CONTRAfold

CONTRAfold predicts RNA secondary structure using conditional log-linear models with discriminative training and feature-rich probabilistic scoring as an alternative to thermodynamic parameter estimation.


Key Features:

  • Conditional log-linear models: Uses conditional log-linear models as the core probabilistic framework for RNA secondary structure prediction.
  • Extension of SCFGs: Extends stochastic context-free grammars (SCFGs) to incorporate richer features and discriminative modeling.
  • Discriminative training: Employs discriminative training methods for parameter estimation rather than solely generative or thermodynamic approaches.
  • Feature-rich scoring: Implements feature-rich scoring mechanisms to capture diverse sequence and structural signals.
  • Statistical learning for parameters: Leverages statistical learning procedures to estimate parameters in place of empirical thermodynamic measurements.
  • Alternative to thermodynamic models: Provides a probabilistic alternative to traditional thermodynamic parameter estimation for RNA folding.
  • Integration with RAF: Can be used within the RAF (RNA Alignment and Folding) framework for simultaneous alignment and consensus folding of unaligned RNA sequences.
  • Compatibility with CONTRAlign: Works alongside CONTRAlign to identify potential pairing and alignment candidates used in sparse inference.
  • Sparsity exploitation and quadratic runtime: Capitalizes on sparsity in pairing and alignment candidates to achieve an effectively quadratic running time for pairwise alignment and folding.
  • Fast sparse dynamic programming: Uses fast sparse dynamic programming as the core inference engine within RAF.
  • Discriminative ML parameter estimation: Integrates fast sparse dynamic programming into a discriminative machine learning algorithm for parameter estimation.
  • Benchmark performance and efficiency: Demonstrated cross-validated accuracies matching or exceeding leading methods while reducing computational time by nearly an order of magnitude for simultaneous folding and alignment tasks.

Scientific Applications:

  • Simultaneous alignment and consensus folding (RAF): Enables joint alignment and consensus secondary-structure prediction of unaligned RNA sequences within the RAF framework.
  • Microarray probe selection: Supports selection of RNA microarray probes through structure-aware analysis.
  • De novo non-coding RNA gene prediction: Facilitates de novo prediction of non-coding RNA genes by incorporating structural prediction into discovery workflows.
  • RNA multiple-sequence secondary structure prediction: Applied to multiple-sequence secondary structure prediction with competitive accuracy in benchmarks.
  • High-throughput RNA structural studies: Offers an efficient approach suitable for high-throughput analyses due to reduced computational time.

Methodology:

Uses conditional log-linear models (extensions of SCFGs) with discriminative training and feature-rich scoring; RAF leverages sparsity identified by CONTRAfold/CONTRAlign and fast sparse dynamic programming as the core inference engine within a discriminative machine learning algorithm for parameter estimation.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux
Programming Languages:
C++
Added:
12/18/2017
Last Updated:
11/25/2024

Operations

Publications

Do CB, Foo C, Batzoglou S. A max-margin model for efficient simultaneous alignment and folding of RNA sequences. Bioinformatics. 2008;24(13):i68-i76. doi:10.1093/bioinformatics/btn177. PMID:18586747. PMCID:PMC2718655.

Documentation

Links