GENSCAN

GENSCAN predicts exon–intron gene structures in genomic DNA sequences using probabilistic models that integrate transcriptional, translational, splicing signals, length distributions, and compositional attributes to identify complete and partial genes.


Key Features:

  • Probabilistic model: Uses a general probabilistic model that integrates transcriptional, translational, and splicing signals for gene structure prediction.
  • Signal integration: Incorporates transcriptional, translational, and splicing signals as explicit inputs to the prediction model.
  • Length and composition modeling: Models length distributions and compositional attributes of exons, introns, and intergenic regions.
  • C + G adaptation: Derives distinct sets of model parameters for different C + G compositional regions to account for regional variation in gene density and structure.
  • Splice signal models: Implements donor and acceptor splice-site models that capture dependencies between signal positions.
  • Multi-gene prediction: Predicts multiple genes within a single input DNA sequence.
  • Partial and complete genes: Handles prediction of both partial and complete genes.
  • Both-strand prediction: Predicts consistent sets of genes on either or both DNA strands.
  • Exon-level confidence: Provides a confidence indication for each predicted exon.
  • Performance: Identifies 75 to 80% of exons exactly on standardized sets of human and vertebrate genes.
  • Robustness: Maintains high accuracy across sequences with varying C + G content and among different vertebrate groups.

Scientific Applications:

  • Gene structure annotation: Identification of complete exon/intron structures in genomic DNA for gene annotation.
  • Regional genomic analysis: Analysis of gene density and structure across different C + G compositional regions of the human genome.
  • Splice-site characterization: Modeling and prediction of donor and acceptor splice signals that include positional dependencies.
  • Multi-gene and strand-aware annotation: Annotation of multiple genes per sequence and consistent gene sets on one or both DNA strands.
  • Vertebrate comparative assessment: Evaluation and prediction of genes across human and other vertebrate gene sets.

Methodology:

Employs a general probabilistic model integrating transcriptional, translational, and splicing signals; models length distributions and compositional attributes of exons, introns, and intergenic regions; derives distinct parameter sets per C + G compositional region; and uses donor/acceptor splice-site models that capture positional dependencies.

Topics

Details

Tool Type:
web application
Operating Systems:
Linux, Windows, Mac
Added:
8/3/2017
Last Updated:
11/25/2024

Operations

Publications

Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology. 1997;268(1):78-94. doi:10.1006/jmbi.1997.0951. PMID:9149143.

Documentation

Links