RVboost

RVBoost prioritizes genetic variants from Illumina RNA sequencing (RNA-seq) data to distinguish true variants from artefacts and improve variant calling in transcriptomes.


Key Features:

  • Boosting-based classification: Uses a boosting method to train models that distinguish high-quality variants using common variants from the HapMap project.
  • Integrated analysis workflow: Provides a workflow encompassing variant calling, annotation, and filtering tailored to RNA-seq data.
  • RNA-seq-specific attributes: Incorporates features such as distance of variant positions from exon boundaries and the percentage of reads supporting the variant within the first six base pairs to detect RNA-seq artifacts (e.g., random hexamer priming).
  • Benchmark performance: Demonstrated superior performance relative to GATK variant quality score recalibration (VQSR) and the SNPiR pipeline in comparative analyses.
  • Implemented environments: Implemented to run on Mac and Linux computational environments.

Scientific Applications:

  • Improved variant calling in transcriptomes: Enhances the reliability of single-nucleotide variant and small indel identification from RNA-seq data.
  • Support for expression studies: Supports studies that quantify gene and exon expression levels by reducing variant-calling artefacts that can confound analyses.
  • Transcript discovery: Aids discovery of novel transcripts by providing more accurate variant information within expressed regions.
  • Fusion gene detection contexts: Contributes to analyses detecting fusion genes by improving the accuracy of variant calls in RNA-seq datasets.

Methodology:

Applies a boosting classifier trained on HapMap common variants; extracts RNA-seq-specific features including distance from exon boundaries and read-position bias (percentage of variant-supporting reads in the first six bases); integrates variant calling, annotation, and filtering; benchmarked on 12 RNA-seq samples with ground-truth from paired exome sequencing against GATK VQSR and SNPiR.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux, Windows
Programming Languages:
R
Added:
8/3/2017
Last Updated:
11/25/2024

Operations

Publications

Wang C, Davila JI, Baheti S, Bhagwate AV, Wang X, Kocher JA, Slager SL, Feldman AL, Novak AJ, Cerhan JR, Thompson EA, Asmann YW. RVboost: RNA-seq variants prioritization using a boosting method. Bioinformatics. 2014;30(23):3414-3416. doi:10.1093/bioinformatics/btu577. PMID:25170027. PMCID:PMC4296157.

Documentation

Links