InterPro
InterPro classifies protein sequences into families, predicts domains and functional sites, and facilitates protein annotation by integrating predictive signatures from multiple member databases.
Key Features:
- Integration of diverse signatures: InterPro amalgamates predictive models or "signatures" from member databases including Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY, TIGRFAMs, SFLD, and CDD into unified entries.
- Unified entry metadata: Each InterPro entry includes unique accession numbers, functional descriptions, literature references, and cross-references back to the relevant member databases.
- Coverage and scale: Release 70.0 comprises over 16,549 entries and covers approximately 79.8% of UniProtKB proteins, with an 18% increase in database size reported.
- Residue-level annotation and disorder prediction: InterPro provides residue-level annotations and predictions of intrinsic disorder to refine functional inferences.
- Structural and non-signature data: InterPro includes structural data and non-signature information, with non-signature data available in XML format on its FTP site.
- Domain architecture and functional site types: InterPro defines domain architectures and includes specific entry types such as active site and binding site to describe functional features.
- InterProScan sequence search: InterProScan enables searching of protein and nucleic acid sequences against InterPro's predictive models.
Scientific Applications:
- Functional annotation: Predicting domains and functional sites to infer molecular function and guide annotation of UniProtKB and novel sequences.
- Protein classification: Grouping proteins into families and describing domain architectures to support evolutionary and comparative analyses.
- Sequence analysis and hypothesis generation: Identifying distant relationships in novel sequences, inferring protein functions, and informing experimental design.
Methodology:
InterPro integrates signatures from member databases (Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY, TIGRFAMs, SFLD, CDD) via manual curation; provides residue-level annotation and intrinsic disorder prediction; and supports sequence searches of protein and nucleic acid inputs using InterProScan against its predictive models.
Topics
Details
- Tool Type:
- web application
- Operating Systems:
- Linux, Windows, Mac
- Added:
- 6/11/2015
- Last Updated:
- 6/27/2022
Operations
Publications
Mitchell A, Chang H, Daugherty L, Fraser M, Hunter S, Lopez R, McAnulla C, McMenamin C, Nuka G, Pesseat S, Sangrador-Vegas A, Scheremetjew M, Rato C, Yong S, Bateman A, Punta M, Attwood TK, Sigrist CJ, Redaschi N, Rivoire C, Xenarios I, Kahn D, Guyot D, Bork P, Letunic I, Gough J, Oates M, Haft D, Huang H, Natale DA, Wu CH, Orengo C, Sillitoe I, Mi H, Thomas PD, Finn RD. The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Research. 2014;43(D1):D213-D221. doi:10.1093/nar/gku1243. PMID:25428371. PMCID:PMC4383996.
McDowall J, Hunter S. InterPro Protein Classification. Methods in Molecular Biology. 2010. doi:10.1007/978-1-60761-977-2_3. PMID:21082426.
Finn RD, Attwood TK, Babbitt PC, Bateman A, Bork P, Bridge AJ, Chang H, Dosztányi Z, El-Gebali S, Fraser M, Gough J, Haft D, Holliday GL, Huang H, Huang X, Letunic I, Lopez R, Lu S, Marchler-Bauer A, Mi H, Mistry J, Natale DA, Necci M, Nuka G, Orengo CA, Park Y, Pesseat S, Piovesan D, Potter SC, Rawlings ND, Redaschi N, Richardson L, Rivoire C, Sangrador-Vegas A, Sigrist C, Sillitoe I, Smithers B, Squizzato S, Sutton G, Thanki N, Thomas PD, Tosatto SCE, Wu CH, Xenarios I, Yeh L, Young S, Mitchell AL. InterPro in 2017—beyond protein family and domain annotations. Nucleic Acids Research. 2016;45(D1):D190-D199. doi:10.1093/nar/gkw1107. PMID:27899635. PMCID:PMC5210578.
Blum M, Chang H, Chuguransky S, Grego T, Kandasaamy S, Mitchell A, Nuka G, Paysan-Lafosse T, Qureshi M, Raj S, Richardson L, Salazar GA, Williams L, Bork P, Bridge A, Gough J, Haft DH, Letunic I, Marchler-Bauer A, Mi H, Natale DA, Necci M, Orengo CA, Pandurangan AP, Rivoire C, Sigrist CJA, Sillitoe I, Thanki N, Thomas PD, Tosatto SCE, Wu CH, Bateman A, Finn RD. The InterPro protein families and domains database: 20 years on. Nucleic Acids Research. 2020;49(D1):D344-D354. doi:10.1093/nar/gkaa977. PMID:33156333. PMCID:PMC7778928.
Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, Bernard T, Binns D, Bork P, Burge S, de Castro E, Coggill P, Corbett M, Das U, Daugherty L, Duquenne L, Finn RD, Fraser M, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, McMenamin C, Mi H, Mutowo-Muellenet P, Mulder N, Natale D, Orengo C, Pesseat S, Punta M, Quinn AF, Rivoire C, Sangrador-Vegas A, Selengut JD, Sigrist CJA, Scheremetjew M, Tate J, Thimmajanarthanan M, Thomas PD, Wu CH, Yeats C, Yong S. InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Research. 2011;40(D1):D306-D312. doi:10.1093/nar/gkr948. PMID:22096229. PMCID:PMC3245097.
Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJA, Thimma M, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C. InterPro: the integrative protein signature database. Nucleic Acids Research. 2009;37(Database):D211-D215. doi:10.1093/nar/gkn785. PMID:18940856. PMCID:PMC2686546.
Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Buillard V, Cerutti L, Copley R, Courcelle E, Das U, Daugherty L, Dibley M, Finn R, Fleischmann W, Gough J, Haft D, Hulo N, Hunter S, Kahn D, Kanapin A, Kejariwal A, Labarga A, Langendijk-Genevaux PS, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Nikolskaya AN, Orchard S, Orengo C, Petryszak R, Selengut JD, Sigrist CJA, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C. New developments in the InterPro database. Nucleic Acids Research. 2007;35(Database):D224-D228. doi:10.1093/nar/gkl841. PMID:17202162. PMCID:PMC1899100.
Mulder NJ. InterPro, progress and status in 2005. Nucleic Acids Research. 2004;33(Database issue):D201-D205. doi:10.1093/nar/gki106. PMID:15608177. PMCID:PMC540060.
Mulder NJ. The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Research. 2003;31(1):315-318. doi:10.1093/nar/gkg046. PMID:12520011. PMCID:PMC165493.
Unknown Authors. InterPro: An integrated documentation resource for protein families, domains and functional sites. Briefings in Bioinformatics. 2002;3(3):225-235. doi:10.1093/bib/3.3.225. PMID:12230031.
Apweiler R. The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Research. 2001;29(1):37-40. doi:10.1093/nar/29.1.37. PMID:11125043. PMCID:PMC29841.
Mitchell AL, Attwood TK, Babbitt PC, Blum M, Bork P, Bridge A, Brown SD, Chang H, El-Gebali S, Fraser MI, Gough J, Haft DR, Huang H, Letunic I, Lopez R, Luciani A, Madeira F, Marchler-Bauer A, Mi H, Natale DA, Necci M, Nuka G, Orengo C, Pandurangan AP, Paysan-Lafosse T, Pesseat S, Potter SC, Qureshi MA, Rawlings ND, Redaschi N, Richardson LJ, Rivoire C, Salazar GA, Sangrador-Vegas A, Sigrist CJA, Sillitoe I, Sutton GG, Thanki N, Thomas PD, Tosatto SCE, Yong S, Finn RD. InterPro in 2019: improving coverage, classification and access to protein sequence annotations. Nucleic Acids Research. 2018;47(D1):D351-D360. doi:10.1093/nar/gky1100. PMID:30398656. PMCID:PMC6323941.