egpred

egpred integrates BLASTX and BLASTN similarity searches with ab initio predictions and NNSPLICE-based splice-site reassignment to improve exon-level gene prediction accuracy.


Key Features:

  • Hybrid approach: Combines ab initio predictions with similarity searches to balance specificity and sensitivity for exon detection.
  • Initial BLASTX search: Searches the genomic sequence against the RefSeq database with E-value < 1 to identify coding region hits.
  • Relaxed BLASTX search: Uses initial hits as queries with relaxed parameters (E-value < 10) to capture additional exon candidates.
  • BLASTN intron detection: Searches the genomic sequence against an intron database to identify probable intron regions.
  • Exon–intron comparison and filtering: Compares predicted exon and intron regions to filter out incorrect exon predictions.
  • Splice-site reassignment: Uses NNSPLICE to reassess and reassign splicing signal positions within probable coding exons.
  • Integration with ab initio predictions: Integrates refined exon data with ab initio outputs considering start/stop and splice signal strengths.
  • Benchmark performance: Reports a 4%–10% performance improvement on HMR195 and Burset/Guigo datasets.
  • Large-scale demonstration: Applied to approximately 95 Mbp of human chromosome 13 for large-fragment gene prediction.
  • Computational intensity: Multiple BLAST runs required per analysis increase computational demand.

Scientific Applications:

  • Exon-level gene prediction: Identification and refinement of exon regions within genomic sequences.
  • Intron detection and boundary refinement: Detection of probable introns and refinement of exon–intron boundaries using BLASTN and NNSPLICE.
  • Benchmarking ab initio methods: Enhancing and evaluating ab initio program performance on benchmark datasets such as HMR195 and Burset/Guigo.
  • Large genomic region analysis: Gene prediction on large genomic fragments, exemplified by a ~95 Mbp region of human chromosome 13.

Methodology:

Sequential computational steps: initial BLASTX of the genomic sequence against RefSeq (E-value < 1); relaxed BLASTX using initial hits as queries (E-value < 10); BLASTN against an intron database; comparison and filtering of probable exon and intron regions; reassignment of splicing signals with NNSPLICE; and integration of refined exon data with ab initio predictions using start/stop and splice-site strength metrics.

Topics

Details

Tool Type:
web application
Operating Systems:
Mac, Linux, Windows
Added:
10/3/2022
Last Updated:
10/3/2022

Operations

Publications

Issac B, Raghava GPS. EGPred: Prediction of Eukaryotic Genes Using Ab Initio Methods After Combining With Sequence Similarity Approaches. Genome Research. 2004;14(9):1756-1766. doi:10.1101/gr.2524704. PMID:15342559. PMCID:PMC515322.

Documentation

Links