UniProt
UniProt provides comprehensive protein sequence records and functional annotations to support protein function analysis, reference proteome construction, sequence-similarity searches, and genomic variant interpretation.
Key Features:
- UniProt Knowledgebase (UniProtKB): Comprises UniProtKB/Swiss-Prot (manually curated entries) and UniProtKB/TrEMBL (automatically annotated entries), collectively providing over 190 million protein sequences with expert curation for more than half a million proteins and automated annotation support from the Association-Rule-Based Annotator (ARBA).
- UniProt Reference Clusters (UniRef): Clusters sequences at 100% (UniRef100), 90% (UniRef90), and 50% (UniRef50) identity to compress sequence space and accelerate similarity searches.
- UniProt Archive (UniParc): Stores all publicly available protein sequences and maintains sequence history with cross-references to source databases.
- UniProt Metagenomic and Environmental Sequences Database: Provides a repository of metagenomic and environmental protein sequences for metagenomics research.
- Reference proteomes: Curated reference proteome set covering 5,631 species to enhance taxonomic representation.
- Genome browser tracks: Provides protein-centric tracks for major genome browsers to support genomic variant interpretation.
- SPARQL endpoint: Enables complex, programmable queries over UniProt data via a SPARQL interface.
Scientific Applications:
- Protein function analysis: Use detailed UniProt annotations and literature-based evidence to interpret protein function and attributes.
- Sequence similarity searches: Use UniRef clusters to perform rapid and scalable sequence-similarity searches.
- Genomic variant interpretation: Integrate protein annotations and genome browser tracks to assess the potential impact of genomic variants.
- Metagenomics: Annotate and analyze protein sequences derived from metagenomic and environmental datasets.
- Complex data queries: Perform federated or advanced queries against UniProt data using the SPARQL endpoint.
Methodology:
Manual curation by expert curators; automated, rule-based annotation systems including the Association-Rule-Based Annotator (ARBA); and integration, interpretation, and standardization of data from multiple source databases.
Topics
Details
- License:
- CC-BY-4.0
- Added:
- 1/18/2021
- Last Updated:
- 6/27/2022
Operations
Publications
Unknown Authors. Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Research. 2011;40(D1):D71-D75. doi:10.1093/nar/gkr981. PMID:22102590. PMCID:PMC3245120.
Magrane M, Consortium U. UniProt Knowledgebase: a hub of integrated protein data. Database. 2011;2011(0):bar009-bar009. doi:10.1093/database/bar009. PMID:21447597. PMCID:PMC3070428.
Unknown Authors. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Research. 2018;47(D1):D506-D515. doi:10.1093/nar/gky1049. PMID:30395287. PMCID:PMC6323992.
Bateman A, Martin M, Orchard S, Magrane M, Agivetova R, Ahmad S, Alpi E, Bowler-Barnett EH, Britto R, Bursteinas B, Bye-A-Jee H, Coetzee R, Cukura A, Da Silva A, Denny P, Dogan T, Ebenezer T, Fan J, Castro LG, Garmiri P, Georghiou G, Gonzales L, Hatton-Ellis E, Hussein A, Ignatchenko A, Insana G, Ishtiaq R, Jokinen P, Joshi V, Jyothi D, Lock A, Lopez R, Luciani A, Luo J, Lussi Y, MacDougall A, Madeira F, Mahmoudy M, Menchi M, Mishra A, Moulang K, Nightingale A, Oliveira CS, Pundir S, Qi G, Raj S, Rice D, Lopez MR, Saidi R, Sampson J, Sawford T, Speretta E, Turner E, Tyagi N, Vasudev P, Volynkin V, Warner K, Watkins X, Zaru R, Zellner H, Bridge A, Poux S, Redaschi N, Aimo L, Argoud-Puy G, Auchincloss A, Axelsen K, Bansal P, Baratin D, Blatter M, Bolleman J, Boutet E, Breuza L, Casals-Casas C, de Castro E, Echioukh KC, Coudert E, Cuche B, Doche M, Dornevil D, Estreicher A, Famiglietti ML, Feuermann M, Gasteiger E, Gehant S, Gerritsen V, Gos A, Gruaz-Gumowski N, Hinz U, Hulo C, Hyka-Nouspikel N, Jungo F, Keller G, Kerhornou A, Lara V, Le Mercier P, Lieberherr D, Lombardot T, Martin X, Masson P, Morgat A, Neto TB, Paesano S, Pedruzzi I, Pilbout S, Pourcel L, Pozzato M, Pruess M, Rivoire C, Sigrist C, Sonesson K, Stutz A, Sundaram S, Tognolli M, Verbregue L, Wu CH, Arighi CN, Arminski L, Chen C, Chen Y, Garavelli JS, Huang H, Laiho K, McGarvey P, Natale DA, Ross K, Vinayaka CR, Wang Q, Wang Y, Yeh L, Zhang J, Ruch P, Teodoro D. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Research. 2020;49(D1):D480-D489. doi:10.1093/nar/gkaa1100. PMID:33237286. PMCID:PMC7778908.
Unknown Authors. The Universal Protein Resource (UniProt) 2009. Nucleic Acids Research. 2009;37(Database):D169-D174. doi:10.1093/nar/gkn664. PMID:18836194. PMCID:PMC2686606.
Unknown Authors. The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Research. 2009;38(suppl_1):D142-D148. doi:10.1093/nar/gkp846. PMID:19843607. PMCID:PMC2808944.
Bairoch A. The Universal Protein Resource (UniProt). Nucleic Acids Research. 2004;33(Database issue):D154-D159. doi:10.1093/nar/gki070. PMID:15608167. PMCID:PMC540024.
Unknown Authors. The Universal Protein Resource (UniProt). Nucleic Acids Research. 2007;35(Database):D193-D197. doi:10.1093/nar/gkl929. PMID:17142230. PMCID:PMC1669721.
Wu CH. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Research. 2006;34(90001):D187-D191. doi:10.1093/nar/gkj161. PMID:16381842. PMCID:PMC1347523.
Unknown Authors. The Universal Protein Resource (UniProt). Nucleic Acids Research. 2007;36(Database):D190-D195. doi:10.1093/nar/gkm895. PMID:18045787. PMCID:PMC2238893.
Apweiler R. UniProt: the Universal Protein knowledgebase. Nucleic Acids Research. 2004;32(90001):115D-119. doi:10.1093/nar/gkh131. PMID:14681372. PMCID:PMC308865.
Unknown Authors. UniProt: the universal protein knowledgebase. Nucleic Acids Research. 2016;45(D1):D158-D169. doi:10.1093/nar/gkw1099. PMID:27899622. PMCID:PMC5210571.
UniProt Consortium T. UniProt: the universal protein knowledgebase. Nucleic Acids Research. 2018;46(5):2699-2699. doi:10.1093/nar/gky092. PMID:29425356. PMCID:PMC5861450.
Unknown Authors. Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Research. 2012;41(D1):D43-D47. doi:10.1093/nar/gks1068. PMID:23161681. PMCID:PMC3531094.
Downloads
- Downloads pagehttp://www.uniprot.org/downloads