AUGUSTUS
AUGUSTUS predicts protein-coding genes in eukaryotic genomes using ab initio modeling and integration of extrinsic evidence such as RNA-Seq, expressed sequence tags (ESTs), proteomics data, and annotations from related species.
Key Features:
- Ab Initio Gene Prediction: Predicts gene structures in genomic sequences using a Hidden Markov Model (HMM) framework with models for intron length, donor splice sites, and GC-content dependent parameters.
- Extrinsic Evidence Integration: Incorporates RNA-Seq data, ESTs, proteomics data, and annotations from related genomes to improve gene prediction accuracy.
- Comparative Gene Prediction: Performs simultaneous gene prediction across multiple aligned genomes by exploiting evolutionary conservation and negative selection.
- Genome-Specific Model Training: Allows parameter training tailored to specific genomes to improve prediction accuracy.
- PPX Extension: Uses protein multiple sequence alignments to identify additional members of protein families within genomes.
Scientific Applications:
- Eukaryotic Genome Annotation: Identifies protein-coding genes during structural annotation of eukaryotic genomes.
- Comparative Genomics: Enables gene prediction across multiple aligned genomes to study evolutionary conservation of gene structures.
- Transcriptome-Assisted Annotation: Integrates RNA-Seq and EST evidence to refine gene models in genome annotation projects.
Methodology:
AUGUSTUS applies Hidden Markov Models with submodels for splice sites, intron length distributions, and GC-content dependent parameters, integrates extrinsic evidence from transcriptomic and proteomic data, and performs comparative gene prediction on aligned genomes using a graph-based binary labeling framework optimized through subgradient-based dual decomposition.
Topics
Collections
Details
- License:
- Artistic-1.0
- Maturity:
- Mature
- Tool Type:
- command-line tool
- Operating Systems:
- Linux, Mac
- Programming Languages:
- C++
- Added:
- 2/10/2017
- Last Updated:
- 11/25/2024
Operations
Data Inputs & Outputs
Ab-initio gene prediction
Homology-based gene prediction
Inputs
Outputs
Publications
Hoff KJ, Lomsadze A, Borodovsky M, Stanke M. Whole-Genome Annotation with BRAKER. Methods in Molecular Biology. 2019. doi:10.1007/978-1-4939-9173-0_5. PMID:31020555. PMCID:PMC6635606.
Nachtweide S, Stanke M. Multi-Genome Annotation with AUGUSTUS. Methods in Molecular Biology. 2019. doi:10.1007/978-1-4939-9173-0_8. PMID:31020558.
Stanke M, Waack S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 2003;19(suppl_2):ii215-ii225. doi:10.1093/bioinformatics/btg1080. PMID:14534192.
Stanke M, Schöffmann O, Morgenstern B, Waack S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics. 2006;7(1). doi:10.1186/1471-2105-7-62. PMID:16469098. PMCID:PMC1409804.
Stanke M, Tzvetkova A, Morgenstern B. AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome. Genome Biology. 2006;7(S1). doi:10.1186/gb-2006-7-s1-s11. PMID:16925833. PMCID:PMC1810548.
Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve <i>de novo</i> gene finding. Bioinformatics. 2008;24(5):637-644. doi:10.1093/bioinformatics/btn013. PMID:18218656.
Keller O, Kollmar M, Stanke M, Waack S. A novel hybrid gene prediction method employing protein multiple sequence alignments. Bioinformatics. 2011;27(6):757-763. doi:10.1093/bioinformatics/btr010. PMID:21216780.
König S, Romoth LW, Gerischer L, Stanke M. Simultaneous gene finding in multiple genomes. Bioinformatics. 2016;32(22):3388-3395. doi:10.1093/bioinformatics/btw494. PMID:27466621. PMCID:PMC5860283.
Hoff KJ, Stanke M. Predicting Genes in Single Genomes with AUGUSTUS. Current Protocols in Bioinformatics. 2018;65(1). doi:10.1002/cpbi.57. PMID:30466165.