WebAUGUSTUS
WebAUGUSTUS predicts protein-coding genes and alternative splice variants in eukaryotic genomic sequences and supports parameter training and constraint-guided gene model refinement for genome annotation.
Key Features:
- Ab initio gene prediction: Uses a generalized Hidden Markov Model (HMM) to perform de novo prediction of protein-coding genes from eukaryotic genomic sequences.
- Intron and splice-site modeling: Implements novel intron length distribution modeling and a donor splice site model to improve accuracy on longer genomic regions and multi-gene sequences.
- User-defined constraints: Incorporates positional constraints for splice sites, translation initiation sites, stop codons, known exons, and exon/intron intervals to include partial gene-structure evidence such as expressed sequence tags (ESTs) or protein alignments.
- Multiple transcript prediction: Predicts multiple splice variants per gene to represent alternative splicing.
- Motif searching: Supports searching user-defined regular expressions against putative proteins encoded by predicted genes.
- Training capabilities: Trains AUGUSTUS parameters from genomic sequences together with expressed sequence tags or protein sequences to produce species-optimized parameter sets.
- GC-content dependent parameter estimation: Estimates parameters dependent on GC-content to adapt predictions to regional compositional variation.
Scientific Applications:
- Eukaryotic genome annotation: Annotates newly sequenced and assembled eukaryotic genomes by predicting protein-coding genes.
- Alternative splicing analysis: Characterizes alternative splice variants and transcript diversity within gene loci.
- Evidence-guided gene model refinement: Refines gene models using ESTs, protein alignments, and user-specified positional constraints.
- Species-specific parameter optimization: Produces and applies trained AUGUSTUS parameter sets for improved predictions on specific genomes.
Methodology:
Applies a generalized Hidden Markov Model (HMM) integrating submodels including intron length distribution and a donor splice site model, uses GC-content dependent parameter estimation, and supports parameter training from genomic sequences with expressed sequence tags or protein sequences.
Topics
Details
- Maturity:
- Mature
- Tool Type:
- web application
- Operating Systems:
- Linux, Windows, Mac
- Programming Languages:
- C++, Perl
- Added:
- 3/25/2017
- Last Updated:
- 11/25/2024
Operations
Data Inputs & Outputs
Gene prediction
Ab-initio gene prediction
Homology-based gene prediction
Publications
Stanke M, Waack S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 2003;19(suppl_2):ii215-ii225. doi:10.1093/bioinformatics/btg1080. PMID:14534192.
Stanke M, Steinkamp R, Waack S, Morgenstern B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Research. 2004;32(Web Server):W309-W312. doi:10.1093/nar/gkh379. PMID:15215400. PMCID:PMC441517.
Stanke M, Morgenstern B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Research. 2005;33(Web Server):W465-W467. doi:10.1093/nar/gki458. PMID:15980513. PMCID:PMC1160219.
Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Research. 2006;34(Web Server):W435-W439. doi:10.1093/nar/gkl200. PMID:16845043. PMCID:PMC1538822.
Hoff KJ, Stanke M. WebAUGUSTUS--a web service for training AUGUSTUS and predicting genes in eukaryotes. Nucleic Acids Research. 2013;41(W1):W123-W128. doi:10.1093/nar/gkt418. PMID:23700307. PMCID:PMC3692069.