DeepBound

DeepBound identifies transcript boundaries from RNA sequencing (RNA-seq) read alignments using deep convolutional neural fields to improve transcript assembly and the discovery of novel genes and transcripts.


Key Features:

  • Deep convolutional neural fields: Uses deep convolutional neural fields to learn hidden distributions and patterns of transcript boundaries.
  • AUC-based optimization: Integrates the AUC (area under the curve) score into the optimization objective to enhance boundary prediction accuracy.
  • Label-imbalance handling: Addresses label-imbalance issues inherent in boundary training data through the AUC-based objective.
  • Simulated training data: Utilizes simulated RNA-seq datasets for training to provide large labeled sample sets for the deep probabilistic graphical model.
  • Noisy/weak signal targeting: Specifically tackles noisy and weak signals at transcript boundaries that complicate accurate determination from RNA-seq reads.
  • Complementary to spliced-read junction detection: Focuses on boundary identification distinct from splicing junction detection, which is addressed by spliced reads.
  • Empirical performance: Demonstrated superior performance compared to existing methods on simulation datasets from two species and on biological datasets.

Scientific Applications:

  • Transcript assembly: Improves full-length transcript assembly by providing more accurate boundary locations from RNA-seq data.
  • Novel gene and transcript discovery: Enables discovery of novel genes and transcripts through enhanced boundary detection.
  • Gene expression and functional studies: Facilitates more complete transcript models for analysis of gene expression patterns and gene function.
  • Method benchmarking: Serves as a benchmark for evaluating transcript boundary identification methods on simulated and biological datasets.

Methodology:

Implements deep convolutional neural fields trained on simulated RNA-seq datasets with an AUC (area under the curve)-based optimization objective to learn hidden distributions of transcript boundaries and address label-imbalance.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux, Mac
Programming Languages:
Shell
Added:
6/15/2018
Last Updated:
11/25/2024

Operations

Publications

Shao M, Ma J, Wang S. DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields. Bioinformatics. 2017;33(14):i267-i273. doi:10.1093/bioinformatics/btx267. PMID:28881999. PMCID:PMC5870651.

PMID: 28881999
PMCID: PMC5870651
Funding: - National Institutes of Health: R01HG007104 - National Science Foundation: DBI-1564955

Documentation