UniParc

UniParc provides a comprehensive non-redundant archive of protein sequences by assigning a stable UniParc identifier (UPI) to each distinct sequence and aggregating cross-references to source databases.


Key Features:

  • Non-redundant sequence archive: Stores each unique protein sequence once to eliminate redundancy across aggregated data sources.
  • Stable identifiers (UPI): Assigns a stable UniParc identifier to every distinct protein sequence.
  • Daily updates: Integrates new and revised protein sequences from publicly accessible source databases on a daily basis.
  • Cross-references to source databases: Creates links from UniParc entries back to originating databases for traceability of sequence provenance.
  • Centralized sequence-only repository: Maintains only sequences and their cross-references, with additional annotations retained in the original source databases.
  • Aggregated search behavior: Executes searches against UniParc entries to reflect matches across the set of cross-referenced databases.

Scientific Applications:

  • Comparative genomics: Provides a consolidated set of unique protein sequences for cross-species sequence comparison.
  • Evolutionary studies: Supplies non-redundant sequences and provenance links useful for tracing sequence conservation and divergence.
  • Functional annotation projects: Serves as a central sequence index with UPIs and cross-references to support annotation efforts using source-database metadata.
  • Protein sequence analysis: Enables comprehensive sequence searches across aggregated source databases via UniParc entries.

Methodology:

Assigns a stable UniParc identifier (UPI) to each distinct sequence, collects and integrates new and updated protein sequences from public source databases on a daily schedule, and creates cross-references linking each UniParc entry to its originating databases.

Topics

Collections

Details

Tool Type:
web application
Operating Systems:
Linux, Windows, Mac
Added:
6/11/2015
Last Updated:
11/24/2024

Operations

Data Inputs & Outputs

Publications

Leinonen R, Diez FG, Binns D, Fleischmann W, Lopez R, Apweiler R. UniProt archive. Bioinformatics. 2004;20(17):3236-3237. doi:10.1093/bioinformatics/bth191. PMID:15044231.

Documentation