DataRemix
"DataRemix" is a software tool to address the challenges of normalizing RNA-seq data. RNA-seq technology has revolutionized the assessment of transcription abundance, enabling a wide range of downstream analytical tasks such as gene-correlation network inference and eQTL (expression Quantitative Trait Loci) discovery. However, normalizing raw gene expression values is critical to account for biological variation and technical covariates, with different normalization strategies potentially leading to vastly different outcomes in subsequent analyses.
Core Features and Functionalities:
- Advanced Normalization Technique: DataRemix generalizes the singular value decomposition-based reconstruction technique. This method encompasses several common techniques, such as whitening, rank-k approximation, and removing the top k principal components, presenting a more flexible and comprehensive approach to data normalization.
- Three-Parameter Transformation: The tool employs a simple yet powerful three-parameter transformation that can be tuned to adjust the contribution of hidden factors in the dataset. This capability allows DataRemix to enhance the visibility of biological signals that might be obscured by noise.
- Prioritization of Biological Signals: DataRemix excels in prioritizing biological signals over noise without needing external, dataset-specific knowledge. This feature is precious for researchers seeking genuine biological insights from their RNA-seq data.
- Efficient Optimization: The software can be optimized using the Thompson sampling approach, making it feasible to apply DataRemix to computationally intensive objectives, such as eQTL analysis.
- Application and Results: Applied to the Religious Orders Study and Memory and Aging Project dataset, DataRemix has demonstrated its efficacy by reporting what is claimed to be the first replicable trans-eQTL effect in the human brain, showcasing the tool's potential in generating novel biological insights.
Topic
Gene expression;Microarray experiment;RNA-Seq;Molecular interactions, pathways and networks;Gene transcripts
Detail
Operation: Standardisation and normalisation;Expression correlation analysis;Gene regulatory network analysis
Software interface: Library
Language: R
License: GNU General Public License >= version 2
Cost: Free with restrictions
Version name: v0.1.2
Credit: National Institutes of Health (NIH).
Input: -
Output: -
Contact: Maria Chikina mchikina@pitt.edu
Collection: -
Maturity: Stable
Publications
- DataRemix: a universal data transformation for optimal inference from gene expression datasets.
- Mao W, et al. DataRemix: a universal data transformation for optimal inference from gene expression datasets. DataRemix: a universal data transformation for optimal inference from gene expression datasets. 2021; 37:984-991. doi: 10.1093/bioinformatics/btaa745
- https://doi.org/10.1093/BIOINFORMATICS/BTAA745
- PMID: 32821903
- PMC: PMC8128479
Download and documentation
Source: https://github.com/wgmao/DataRemix/releases/tag/v0.1.2
Documentation: https://github.com/wgmao/DataRemix/blob/master/README.md
Home page: https://github.com/wgmao/DataRemix
< Back to DB search