iProX

iProX provides a large-scale proteomics data repository and analysis infrastructure for storing, indexing, and programmatic access to proteomics datasets to support data sharing, rapid retrieval, and reanalysis.


Key Features:

  • Hyper-converged architecture: Ensures high scalability for handling proteomics dataset submissions.
  • Hadoop cluster: Stores extensive proteomics datasets on a distributed file system.
  • Distributed Elastic Search engine: Provides indexing and search with a RESTful-styled interface, enabling queries of millions of records in under one second.
  • Universal Spectrum Identifier (USI): Implements the USI mechanism proposed by ProteomeXchange to standardize spectrum identification across datasets.
  • RESTful Web Service API: Exposes programmatic access and interoperability with other systems and tools.
  • High-efficiency reanalysis pipeline: Supports streamlined reanalysis workflows for deposited proteomics data.
  • Big data and storage scale: Supports petabyte-level storage capacity and hundreds of billions of spectra records with second-level latency service.

Scientific Applications:

  • Data management and dissemination: Manages and disseminates large volumes of proteomics experimental data to support sharing and reuse.
  • Standardized spectrum referencing: Enables USI-based cross-dataset spectrum identification compatible with ProteomeXchange.
  • Large-scale reanalysis: Facilitates high-efficiency reanalysis of deposited proteomics datasets.
  • High-throughput retrieval and indexing: Supports rapid querying and retrieval for proteomics research across millions to hundreds of billions of records.
  • Repository scale: As of August 2021, hosts 1,526 datasets totaling 92.42 terabytes of proteomics data.

Methodology:

Implements a hyper-converged architecture with a Hadoop cluster for storage and a distributed Elastic Search engine with a RESTful-styled interface for indexing and sub-second queries; supports Universal Spectrum Identifier (USI), exposes a RESTful Web Service API, and runs a high-efficiency reanalysis pipeline.

Topics

Details

Cost:
Free of charge
Tool Type:
web application
Operating Systems:
Mac, Linux, Windows
Added:
5/24/2022
Last Updated:
5/24/2022

Operations

Publications

Chen T, Ma J, Liu Y, Chen Z, Xiao N, Lu Y, Fu Y, Yang C, Li M, Wu S, Wang X, Li D, He F, Hermjakob H, Zhu Y. iProX in 2021: connecting proteomics data sharing with big data. Nucleic Acids Research. 2021;50(D1):D1522-D1527. doi:10.1093/nar/gkab1081. PMID:34871441. PMCID:PMC8728291.

PMID: 34871441
PMCID: PMC8728291
Funding: - National Key Research Program of China: 2020YFE0202200 - Innovation special zone: 18-163-15-ZT-001-006-07 - Program for Guangdong Introducing Innovative and Entrepreneurial Teams: 2016ZT06D211 - National Natural Science Foundation of China: 16CXZ027, U1811461