CONTRAfold
CONTRAfold predicts RNA secondary structure using conditional log-linear models with discriminative training and feature-rich probabilistic scoring as an alternative to thermodynamic parameter estimation.
Key Features:
- Conditional log-linear models: Uses conditional log-linear models as the core probabilistic framework for RNA secondary structure prediction.
- Extension of SCFGs: Extends stochastic context-free grammars (SCFGs) to incorporate richer features and discriminative modeling.
- Discriminative training: Employs discriminative training methods for parameter estimation rather than solely generative or thermodynamic approaches.
- Feature-rich scoring: Implements feature-rich scoring mechanisms to capture diverse sequence and structural signals.
- Statistical learning for parameters: Leverages statistical learning procedures to estimate parameters in place of empirical thermodynamic measurements.
- Alternative to thermodynamic models: Provides a probabilistic alternative to traditional thermodynamic parameter estimation for RNA folding.
- Integration with RAF: Can be used within the RAF (RNA Alignment and Folding) framework for simultaneous alignment and consensus folding of unaligned RNA sequences.
- Compatibility with CONTRAlign: Works alongside CONTRAlign to identify potential pairing and alignment candidates used in sparse inference.
- Sparsity exploitation and quadratic runtime: Capitalizes on sparsity in pairing and alignment candidates to achieve an effectively quadratic running time for pairwise alignment and folding.
- Fast sparse dynamic programming: Uses fast sparse dynamic programming as the core inference engine within RAF.
- Discriminative ML parameter estimation: Integrates fast sparse dynamic programming into a discriminative machine learning algorithm for parameter estimation.
- Benchmark performance and efficiency: Demonstrated cross-validated accuracies matching or exceeding leading methods while reducing computational time by nearly an order of magnitude for simultaneous folding and alignment tasks.
Scientific Applications:
- Simultaneous alignment and consensus folding (RAF): Enables joint alignment and consensus secondary-structure prediction of unaligned RNA sequences within the RAF framework.
- Microarray probe selection: Supports selection of RNA microarray probes through structure-aware analysis.
- De novo non-coding RNA gene prediction: Facilitates de novo prediction of non-coding RNA genes by incorporating structural prediction into discovery workflows.
- RNA multiple-sequence secondary structure prediction: Applied to multiple-sequence secondary structure prediction with competitive accuracy in benchmarks.
- High-throughput RNA structural studies: Offers an efficient approach suitable for high-throughput analyses due to reduced computational time.
Methodology:
Uses conditional log-linear models (extensions of SCFGs) with discriminative training and feature-rich scoring; RAF leverages sparsity identified by CONTRAfold/CONTRAlign and fast sparse dynamic programming as the core inference engine within a discriminative machine learning algorithm for parameter estimation.
Topics
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Linux
- Programming Languages:
- C++
- Added:
- 12/18/2017
- Last Updated:
- 11/25/2024
Operations
Publications
Do CB, Foo C, Batzoglou S. A max-margin model for efficient simultaneous alignment and folding of RNA sequences. Bioinformatics. 2008;24(13):i68-i76. doi:10.1093/bioinformatics/btn177. PMID:18586747. PMCID:PMC2718655.