GenomicDataCommons
GenomicDataCommons provides an R/Bioconductor interface to the National Cancer Institute’s Genomic Data Commons (GDC) enabling programmatic querying and retrieval of harmonized genomic, clinical, and biospecimen data for cancer genomics analyses.
Key Features:
- RESTful API access: Exposes the GDC RESTful API enabling programmatic querying, filtering, and retrieval of metadata and molecular profiles directly from R.
- Fluent query syntax: Constructs queries using a fluent, pipe-based syntax that mirrors the GDC Data Model and Data Dictionary to explore cases, files, annotations, and analytical results.
- Harmonized sequencing data: Provides access to uniformly processed sequencing data generated by standardized pipelines for mutation calling, copy-number variation, structural variant detection, and other derived molecular features.
- High-volume data transfer: Integrates with the GDC Data Transfer Tool to support large downloads of genomic files including BAM, FASTQ, VCF, and masked copy-number or mutation calls.
- Controlled-access authentication: Facilitates authenticated retrieval of controlled-access datasets requiring dbGaP authorization while remaining compatible with open-access resources.
- Reproducible Bioconductor workflows: Enables reproduction of GDC Data Portal analytical workflows within Bioconductor using downloaded harmonized datasets for downstream statistical analyses.
- Analytical tool parity: Supports analyses comparable to GDC-provided methods such as mutation frequency visualizations, OncoGrid co-occurrence plots, survival analyses, cohort comparison utilities, and protein-domain mutation mapping.
Scientific Applications:
- Aggregating cancer genomics data: Integrating genomic, clinical, and biospecimen data across major cancer research programs for cross-study analyses.
- Mutation landscape analysis: Characterizing mutation frequencies, co-occurrence patterns, and protein-domain mutation distributions.
- Copy-number and structural variant studies: Analyzing harmonized copy-number variation and structural variant calls across cohorts.
- Survival and cohort comparisons: Performing survival analyses and cohort comparison studies linking molecular profiles to clinical outcomes.
- Downstream reproducible analysis: Incorporating GDC harmonized datasets into Bioconductor pipelines for statistical and bioinformatic analyses.
- Large-scale reanalysis: Downloading BAM, FASTQ, and VCF files for alignment, variant calling, or custom reanalysis workflows.
Methodology:
Expose the GDC RESTful API for programmatic querying, filtering, and retrieval; construct queries via a fluent, pipe-based syntax that mirrors the GDC Data Model and Data Dictionary; integrate with the GDC Data Transfer Tool for high-volume downloads; facilitate authenticated retrieval of controlled-access datasets requiring dbGaP authorization; and access uniformly processed sequencing data produced by standardized pipelines for mutation calling, copy-number variation, structural variant detection, and other derived molecular features.
Topics
Collections
Details
- License:
- Artistic-2.0
- Tool Type:
- library
- Operating Systems:
- Linux, Windows, Mac
- Programming Languages:
- R
- Added:
- 7/9/2018
- Last Updated:
- 12/10/2018