egpred
egpred integrates BLASTX and BLASTN similarity searches with ab initio predictions and NNSPLICE-based splice-site reassignment to improve exon-level gene prediction accuracy.
Key Features:
- Hybrid approach: Combines ab initio predictions with similarity searches to balance specificity and sensitivity for exon detection.
- Initial BLASTX search: Searches the genomic sequence against the RefSeq database with E-value < 1 to identify coding region hits.
- Relaxed BLASTX search: Uses initial hits as queries with relaxed parameters (E-value < 10) to capture additional exon candidates.
- BLASTN intron detection: Searches the genomic sequence against an intron database to identify probable intron regions.
- Exon–intron comparison and filtering: Compares predicted exon and intron regions to filter out incorrect exon predictions.
- Splice-site reassignment: Uses NNSPLICE to reassess and reassign splicing signal positions within probable coding exons.
- Integration with ab initio predictions: Integrates refined exon data with ab initio outputs considering start/stop and splice signal strengths.
- Benchmark performance: Reports a 4%–10% performance improvement on HMR195 and Burset/Guigo datasets.
- Large-scale demonstration: Applied to approximately 95 Mbp of human chromosome 13 for large-fragment gene prediction.
- Computational intensity: Multiple BLAST runs required per analysis increase computational demand.
Scientific Applications:
- Exon-level gene prediction: Identification and refinement of exon regions within genomic sequences.
- Intron detection and boundary refinement: Detection of probable introns and refinement of exon–intron boundaries using BLASTN and NNSPLICE.
- Benchmarking ab initio methods: Enhancing and evaluating ab initio program performance on benchmark datasets such as HMR195 and Burset/Guigo.
- Large genomic region analysis: Gene prediction on large genomic fragments, exemplified by a ~95 Mbp region of human chromosome 13.
Methodology:
Sequential computational steps: initial BLASTX of the genomic sequence against RefSeq (E-value < 1); relaxed BLASTX using initial hits as queries (E-value < 10); BLASTN against an intron database; comparison and filtering of probable exon and intron regions; reassignment of splicing signals with NNSPLICE; and integration of refined exon data with ab initio predictions using start/stop and splice-site strength metrics.
Topics
Details
- Tool Type:
- web application
- Operating Systems:
- Mac, Linux, Windows
- Added:
- 10/3/2022
- Last Updated:
- 10/3/2022
Operations
Publications
Issac B, Raghava GPS. EGPred: Prediction of Eukaryotic Genes Using Ab Initio Methods After Combining With Sequence Similarity Approaches. Genome Research. 2004;14(9):1756-1766. doi:10.1101/gr.2524704. PMID:15342559. PMCID:PMC515322.