DataSHIELD
DataSHIELD performs privacy-protected federated analyses of health and biomedical big data across multiple institutions.
Key Features:
- Privacy Protection: Datasets remain on institutional servers to prevent relocation of individual-level data and maintain participant confidentiality.
- Decentralized Analysis: Analysts execute commands across multiple institutional servers without direct access to raw individual-level data, returning aggregated results that avoid disclosure of individual records.
- Integration with Opal: Integration with Opal, the data integration system from the OBiBa project, enables handling of large-scale epidemiological datasets in their original formats and locations.
- Enhanced Architecture ("Resources"): An architecture called "Resources" permits use of extensive datasets in native environments and the utilization of external computing resources to expand analytical capacity.
- Support for Genomic and Geospatial Data: The platform has been applied to genomics and geospatial projects and supports related infrastructures such as GA4GH and EGA.
- Shell Commands and R Packages: Functionality can be extended via shell commands and a suite of selected R packages for analysis.
Scientific Applications:
- Population health and epidemiology: Privacy-preserving pooled analyses enable multi-institutional studies of health and biomedical cohorts without sharing individual-level data.
- Genomic and geospatial research: Supports large-scale genomic and geospatial analyses while interfacing with GA4GH- and EGA-compatible infrastructures.
Methodology:
Commands are executed across institutional servers while datasets remain in place, returned outputs are aggregated to prevent individual-level disclosure, and the system integrates with Opal (OBiBa), supports shell commands and selected R packages, and leverages external computing resources via its "Resources" architecture.
Topics
Details
- Programming Languages:
- R, Shell
- Added:
- 9/8/2021
- Last Updated:
- 12/2/2024
Operations
Publications
Marcon Y, Bishop T, Avraam D, Escriba-Montagut X, Ryser-Welch P, Wheater S, Burton P, González JR. Orchestrating privacy-protected big data analyses of data from different resources with R and DataSHIELD. PLOS Computational Biology. 2021;17(3):e1008880. doi:10.1371/journal.pcbi.1008880. PMID:33784300. PMCID:PMC8034722.