VARUS
VARUS samples representative RNA-Seq reads from NCBI's Sequence Read Archive to optimize input for genome annotation and transcriptome assembly.
Key Features:
- Automated selection and downloading: Selects SRA runs using a species' binomial name and genome and downloads a representative subset of reads optimized for sensitivity and specificity while minimizing resource usage.
- Randomized sampling algorithm: Employs an online randomized sampling algorithm to cover a broad range of transcripts and operate under limited network bandwidth and computational resources.
- Incremental intron database: Builds and updates an incremental intron database to provide intron-level evidence for refining genome annotation.
- HISAT2 alignment support: Supports alignment of sampled reads with HISAT2 as an alternative alignment program.
Scientific Applications:
- Genome annotation: Automates selection and provisioning of RNA-Seq data for genome annotation pipelines and has been demonstrated on twelve eukaryotic genomes with BRAKER.
- Transcriptome assembly: Provides representative subsets of RNA-Seq reads to improve sensitivity and specificity in transcriptome assembly.
- Resource-efficient large-scale analyses: Enables annotation and transcript discovery across multiple species using fewer reads while achieving higher sensitivity and specificity compared to manually selected runs.
Methodology:
Selects SRA runs by species name and genome, applies an online randomized sampling algorithm to incrementally select and download reads, optionally aligns reads with HISAT2, and maintains an incremental intron database.
Topics
Details
- Programming Languages:
- C++, Perl
- Added:
- 1/14/2020
- Last Updated:
- 1/2/2021
Operations
Publications
Stanke M, Bruhn W, Becker F, Hoff KJ. VARUS: sampling complementary RNA reads from the sequence read archive. BMC Bioinformatics. 2019;20(1). doi:10.1186/s12859-019-3182-x. PMID:31703556. PMCID:PMC6842140.