EBI patent sequence database

EBI patent sequence database provides non-redundant, annotated nucleotide and protein sequences extracted from patent documents to support sequence retrieval, clustering, and patent-specific analyses.

Key Features:

Non-Redundant Databases: Aggregates non-redundant sequence sets covering the EMBL-Bank nucleotides patent class and patent protein databases by eliminating duplicate sequences.
Value-Added Annotations: Incorporates annotations derived from patent documents including publication number corrections, earliest publication dates, and feature collations.
Hierarchical Clustering: Implements two-level clustering based on MD5 checksums with Level-1 clusters grouping sequences that are 100% identical over their entire length and Level-2 sub-clustering using patent family information.
Comprehensive Coverage: Includes both nucleotide sequences from the EMBL-Bank patent class and patent-associated protein sequences.

Scientific Applications:

Enhanced Data Retrieval: Enables precise identification and retrieval of specific sequences and their patent metadata by reducing redundancy and improving annotation quality.
Cross-Disciplinary Research: Provides access to biological sequences and annotations embedded in patent documents for bioinformatics and interdisciplinary studies.
Intellectual Property Analysis: Supports patent validity and scope assessments through MD5-based clustering and patent-derived annotations.

Methodology:

Clustering sequences using MD5 checksums into Level-1 (100% identity) and Level-2 (patent-family based) clusters and incorporating annotations extracted from patent documents such as publication number corrections, earliest publication dates, and feature collations.

Visit Official Homepage →

Topics

Database management Sequence analysis Proteins Gene and protein families Biotechnology

Details

Tool Type:: web application
Operating Systems:: Linux, Windows, Mac
Added:: 3/30/2017
Last Updated:: 11/25/2024

Operations

Publications

Li W, McWilliam H, de la Torre AR, Grodowski A, Benediktovich I, Goujon M, Nauche S, Lopez R. Non-redundant patent sequence databases with value-added annotations at two levels. Nucleic Acids Research. 2009;38(suppl_1):D52-D56. doi:10.1093/nar/gkp960. PMID:19884134. PMCID:PMC2808894.

DOI: 10.1093/nar/gkp960

PMID: 19884134

PMCID: PMC2808894

Documentation

User manual

https://www.ebi.ac.uk/sites/ebi.ac.uk/files/groups/external_services/patentdata/Non-redundant%20databases-user%20manual_v4.pdf

User manual

https://www.ebi.ac.uk/sites/ebi.ac.uk/files/groups/external_services/patentdata/Family%20equivalents%20database_v4.pdf

← Back to search