RVboost
RVBoost prioritizes genetic variants from Illumina RNA sequencing (RNA-seq) data to distinguish true variants from artefacts and improve variant calling in transcriptomes.
Key Features:
- Boosting-based classification: Uses a boosting method to train models that distinguish high-quality variants using common variants from the HapMap project.
- Integrated analysis workflow: Provides a workflow encompassing variant calling, annotation, and filtering tailored to RNA-seq data.
- RNA-seq-specific attributes: Incorporates features such as distance of variant positions from exon boundaries and the percentage of reads supporting the variant within the first six base pairs to detect RNA-seq artifacts (e.g., random hexamer priming).
- Benchmark performance: Demonstrated superior performance relative to GATK variant quality score recalibration (VQSR) and the SNPiR pipeline in comparative analyses.
- Implemented environments: Implemented to run on Mac and Linux computational environments.
Scientific Applications:
- Improved variant calling in transcriptomes: Enhances the reliability of single-nucleotide variant and small indel identification from RNA-seq data.
- Support for expression studies: Supports studies that quantify gene and exon expression levels by reducing variant-calling artefacts that can confound analyses.
- Transcript discovery: Aids discovery of novel transcripts by providing more accurate variant information within expressed regions.
- Fusion gene detection contexts: Contributes to analyses detecting fusion genes by improving the accuracy of variant calls in RNA-seq datasets.
Methodology:
Applies a boosting classifier trained on HapMap common variants; extracts RNA-seq-specific features including distance from exon boundaries and read-position bias (percentage of variant-supporting reads in the first six bases); integrates variant calling, annotation, and filtering; benchmarked on 12 RNA-seq samples with ground-truth from paired exome sequencing against GATK VQSR and SNPiR.
Topics
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Linux, Windows
- Programming Languages:
- R
- Added:
- 8/3/2017
- Last Updated:
- 11/25/2024
Operations
Publications
Wang C, Davila JI, Baheti S, Bhagwate AV, Wang X, Kocher JA, Slager SL, Feldman AL, Novak AJ, Cerhan JR, Thompson EA, Asmann YW. RVboost: RNA-seq variants prioritization using a boosting method. Bioinformatics. 2014;30(23):3414-3416. doi:10.1093/bioinformatics/btu577. PMID:25170027. PMCID:PMC4296157.