DataSHIELD

DataSHIELD performs privacy-protected federated analyses of health and biomedical big data across multiple institutions.


Key Features:

  • Privacy Protection: Datasets remain on institutional servers to prevent relocation of individual-level data and maintain participant confidentiality.
  • Decentralized Analysis: Analysts execute commands across multiple institutional servers without direct access to raw individual-level data, returning aggregated results that avoid disclosure of individual records.
  • Integration with Opal: Integration with Opal, the data integration system from the OBiBa project, enables handling of large-scale epidemiological datasets in their original formats and locations.
  • Enhanced Architecture ("Resources"): An architecture called "Resources" permits use of extensive datasets in native environments and the utilization of external computing resources to expand analytical capacity.
  • Support for Genomic and Geospatial Data: The platform has been applied to genomics and geospatial projects and supports related infrastructures such as GA4GH and EGA.
  • Shell Commands and R Packages: Functionality can be extended via shell commands and a suite of selected R packages for analysis.

Scientific Applications:

  • Population health and epidemiology: Privacy-preserving pooled analyses enable multi-institutional studies of health and biomedical cohorts without sharing individual-level data.
  • Genomic and geospatial research: Supports large-scale genomic and geospatial analyses while interfacing with GA4GH- and EGA-compatible infrastructures.

Methodology:

Commands are executed across institutional servers while datasets remain in place, returned outputs are aggregated to prevent individual-level disclosure, and the system integrates with Opal (OBiBa), supports shell commands and selected R packages, and leverages external computing resources via its "Resources" architecture.

Topics

Details

Programming Languages:
R, Shell
Added:
9/8/2021
Last Updated:
12/2/2024

Operations

Publications

Marcon Y, Bishop T, Avraam D, Escriba-Montagut X, Ryser-Welch P, Wheater S, Burton P, González JR. Orchestrating privacy-protected big data analyses of data from different resources with R and DataSHIELD. PLOS Computational Biology. 2021;17(3):e1008880. doi:10.1371/journal.pcbi.1008880. PMID:33784300. PMCID:PMC8034722.

Links