Pfam

Pfam classifies protein sequences into families and domains to provide curated and automatically generated alignments and Hidden Markov Models for annotation and analysis in structural biology, genomics, and proteomics.

Key Features:

Pfam-A: Contains well-characterized protein domain families with manually checked seed alignments and Hidden Markov Models (HMMs) that carry permanent accession numbers and form a library for sequence searching and automatic annotation of new proteins.
Pfam-B: Provides an automatically generated supplement of novel sequence clusters not matched by Pfam families, with the latest version using MMseqs2 clustering and containing 136,730 sequence families.
Release and coverage: The current release (Pfam 29.0) includes over 16,295 entries and maintains nearly 80% coverage of the UniProt Knowledgebase (UniProtKB).
Reference proteomes basis: Reorganized to use UniProtKB reference proteomes as the primary sequence basis, reporting matches on a smaller, more stable set of sequences while retaining access to model organisms.
Representative proteome alignments: Family alignments are provided based on four different representative proteome sequence datasets.

Scientific Applications:

Protein annotation: Automatic and HMM-based annotation of new protein sequences across genomes and proteomes.
Discovery of novel families: Identification and classification of previously unannotated proteins and novel family memberships, as exemplified in the Caenorhabditis elegans genome project.
Pathogen proteome analysis: Support for analysis of viral proteomes, including studies of the SARS-CoV-2 proteome.

Methodology:

Uses manually checked seed alignments to build Hidden Markov Models and MMseqs2 for clustering Pfam-B sequence families, with matches reported against UniProtKB reference proteomes and alignments provided for representative proteome datasets.

Visit Official Homepage →

Topics

Proteins Gene and protein families Protein folds and structural domains Sequence sites, features and motifs Sequence composition, complexity and repeats Sequence analysis

Details

License:: CC0-1.0
Tool Type:: api, web application
Operating Systems:: Linux, Windows, Mac
Added:: 2/4/2015
Last Updated:: 11/24/2024

Operations

Data Inputs & Outputs

Query and retrieval

Inputs

Protein structure

Outputs

Other operations do not define inputs or outputs.

Publications

Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, Salazar GA, Tate J, Bateman A. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Research. 2015;44(D1):D279-D285. doi:10.1093/nar/gkv1344. PMID:26673716. PMCID:PMC4702930.

DOI: 10.1093/nar/gkv1344

PMID: 26673716

PMCID: PMC4702930

Sonnhammer EL, Eddy SR, Durbin R. Pfam: A comprehensive database of protein domain families based on seed alignments. Proteins: Structure, Function, and Genetics. 1997;28(3):405-420. doi:10.1002/(sici)1097-0134(199707)28:3<405::aid-prot10>3.0.co;2-l. PMID:9223186.

DOI: 10.1002/(sici)1097-0134(199707)28:3<405::aid-prot10>3.0.co;2-l

PMID: 9223186

Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer ELL, Tosatto SCE, Paladin L, Raj S, Richardson LJ, Finn RD, Bateman A. Pfam: The protein families database in 2021. Nucleic Acids Research. 2020;49(D1):D412-D419. doi:10.1093/nar/gkaa913. PMID:33125078. PMCID:PMC7779014.

DOI: 10.1093/nar/gkaa913

PMID: 33125078

PMCID: PMC7779014

Funding: - European Union's Horizon 2020 MSCA-RISE action: 823886 - Wellcome: 108433/Z/15/Z - BBSRC: BB/S020381/1

Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer ELL, Tate J, Punta M. Pfam: the protein families database. Nucleic Acids Research. 2013;42(D1):D222-D230. doi:10.1093/nar/gkt1223. PMID:24288371. PMCID:PMC3965110.

DOI: 10.1093/nar/gkt1223

PMID: 24288371

PMCID: PMC3965110

Documentation

General

http://www.ebi.ac.uk/about/terms-of-use

Links

Mirror

http://pfam.sbc.su.se/

Mirror

http://pfam.jouy.inra.fr/

Mirror

http://pfam.sanger.ac.uk/

Mirror

http://pfam.ccbb.re.kr/index.shtml

Mirror

http://pfam.janelia.org/

← Back to search