CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates

Authors: Joel ZB Low, Tsung Fei Khang, Martti T Tammi
Publication date: 2017/12
Journal: BMC bioinformatics
Volume: 18
Issue: 16
Pages: 575
Publisher: BioMed Central

Abstract

Background

In current statistical methods for calling differentially expressed genes in RNA-Seq experiments, the assumption is that an adjusted observed gene count represents an unknown true gene count. This adjustment usually consists of a normalization step to account for heterogeneous sample library sizes, and then the resulting normalized gene counts are used as input for parametric or non-parametric differential gene expression tests. A distribution of true gene counts, each with a different probability, can result in the same observed gene count. Importantly, sequencing coverage information is currently not explicitly incorporated into any of the statistical models used for RNA-Seq analysis.

Results

We developed a fast Bayesian method which uses the sequencing coverage information determined from the concentration of an RNA sample to estimate the posterior distribution of a true gene count. Our method has better or comparable performance compared to NOISeq and GFOLD, according to the results from simulations and experiments with real unreplicated data. We incorporated a previously unused sequencing coverage parameter into a procedure for differential gene expression analysis with RNA-Seq data.

Conclusions

Our results suggest that our method can be used to overcome analytical bottlenecks in experiments with limited number of replicates and low sequencing coverage. The method is implemented in CORNAS (Coverage-dependent RNA-Seq), and is available at https://github.com/joel-lzb/CORNAS.

CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates

Abstract

Evaluation of methods and marker Systems in Genomic Selection of oil palm (Elaeis guineensis Jacq.)

Abstract

Inspection of Sequence Quality

De novo Assembly of a Genome

Bioinformatics: A Practical Handbook of Next Generation Sequencing and Its Applications

Genomic selection in commercial perennial crops: applicability and improvement in oil palm (Elaeis Guineensis Jacq.)

Development and validation of a high-density SNP genotyping array for African oil palm

Expression patterns of inflorescence‐and sex‐specific transcripts in male and female inflorescences of African oil palm (Elaeis guineensis)

Computational identification of Penaeus monodon microRNA genes and their targets.

Transcripts and microRNAs responding to salt stress in Musa acuminata Colla (AAA Group) cv. Berangan roots

Characterization of a novel binding protein for Fortilin/TCTP—component of a defense mechanism against viral infection in Penaeus monodon

Toll-like receptor 4 promoter polymorphisms: common TLR4 variants may protect against severe urinary tract infection

CNV-seq, a new method to detect copy number variation using high-throughput sequencing

What are next generation innovative therapeutic targets? Clues from genetic, structural, physicochemical, and systems profiles of successful targets

AllerHunter: A SVM-Pairwise System for Assessment of Allergenicity and Allergic Cross-Reactivity in Proteins

Allergen Atlas: a comprehensive knowledge center and analysis resource for allergen information

The value of position-specific scoring matrices for assessment of protein allegenicity

Trends in the exploration of anticancer targets and strategies in enhancing the efficacy of drug targeting

Methods and protocols for the assessment of protein allergenicity and cross-reactivity.

Prediction of protein allergenicity using local description of amino acid sequence.

Database of Trypanosoma cruzi repeated genes: 20 000 additional gene variants

GRAT—genome-scale rapid alignment tool

AllerTool: a web server for predicting allergenicity and allergic cross-reactivity in proteins

Establishing bioinformatics research in the Asia Pacific

DNPTrapper: an assembly editing tool for finishing and analysis of complex repeat regions

MicroTar: predicting microRNA targets from RNA duplexes

Cloning of a human parvovirus by molecular screening of respiratory tract samples

The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease

Biological Databases and Web Services: Metrics for Quality

A genetic variation map for chicken with 2.8 million single-nucleotide polymorphisms

Some microsatellites may act as novel polymorphic cis-regulatory elements through transcription factor binding

ReDiT: Repeat Discrepancy Tagger—a WGS assembly finishing aid

Correcting errors in WGS sequences

The principles of WGS sequencing and automated fragment assembly

TRAP: Tandem Repeat Assembly Program produces improved WGS assemblies of repetitive sequences

Separation of nearly identical repeats in WGS assemblies using defined nucleotide positions, DNPs

Software Tools and Algorithms for WGS Sequence Assembly

MtDNA mutations in maternally inherited diabetes: presence of the 3397 ND1 mutation previously associated with Alzheimer's and Parkinson's disease

The Trypanosoma cruzi genome initiative on the web

Gene survey of the pathogenic protozoan Trypanosoma cruzi

Complete sequence of a 93.4-kb contig from chromosome 3 of Trypanosoma cruzi containing a strand-switch region

Errors in sequence assembly and corrections