SpliceFinder

SpliceFinder predicts splice sites within genomic sequences using convolutional neural networks to identify canonical (GT/AG) and non-canonical splice junctions for gene structure analysis.


Key Features:

  • CNN-based prediction: Employs a convolutional neural network architecture trained on human genomic data for ab initio splice site prediction.
  • Iterative dataset reconstruction: Uses an iterative reconstruction approach to address class imbalance during training.
  • Dinucleotide discrimination: Accounts for frequent GT and AG dinucleotide occurrences in non-splicing regions to reduce false positives.
  • High accuracy: Achieves a classification accuracy of 90.25%, approximately 10% higher than existing algorithms.
  • Reduced false positives: Produces about half as many false positives compared to state-of-the-art tools.
  • High recall: Maintains a recall rate higher than 0.8 for splice site detection.
  • Non-canonical site detection: Identifies non-canonical splice sites in addition to canonical GT/AG junctions.
  • Sliding-window localization: Localizes exact splice site positions within long genomic sequences using a sliding window technique.
  • Cross-species robustness: Generalizes without retraining to Drosophila melanogaster, Mus musculus, Rattus, and Danio rerio.

Scientific Applications:

  • Gene structure analysis: Infers splice junctions to support understanding of gene location and structure.
  • Accurate splice site cataloging: Produces more reliable splice site predictions by reducing false positives while maintaining high recall.
  • Cross-species analysis: Enables splice site identification across multiple species without retraining for comparative sequence analyses.

Methodology:

Training a convolutional neural network on human genomic data, applying iterative dataset reconstruction to mitigate class imbalance, and scanning long sequences with a sliding window to localize splice sites.

Topics

Details

Added:
1/14/2020
Last Updated:
12/24/2020

Operations

Publications

Wang R, Wang Z, Wang J, Li S. SpliceFinder: ab initio prediction of splice sites using convolutional neural network. BMC Bioinformatics. 2019;20(S23). doi:10.1186/s12859-019-3306-3. PMID:31881982. PMCID:PMC6933889.