EBI patent sequence database
EBI patent sequence database provides non-redundant, annotated nucleotide and protein sequences extracted from patent documents to support sequence retrieval, clustering, and patent-specific analyses.
Key Features:
- Non-Redundant Databases: Aggregates non-redundant sequence sets covering the EMBL-Bank nucleotides patent class and patent protein databases by eliminating duplicate sequences.
- Value-Added Annotations: Incorporates annotations derived from patent documents including publication number corrections, earliest publication dates, and feature collations.
- Hierarchical Clustering: Implements two-level clustering based on MD5 checksums with Level-1 clusters grouping sequences that are 100% identical over their entire length and Level-2 sub-clustering using patent family information.
- Comprehensive Coverage: Includes both nucleotide sequences from the EMBL-Bank patent class and patent-associated protein sequences.
Scientific Applications:
- Enhanced Data Retrieval: Enables precise identification and retrieval of specific sequences and their patent metadata by reducing redundancy and improving annotation quality.
- Cross-Disciplinary Research: Provides access to biological sequences and annotations embedded in patent documents for bioinformatics and interdisciplinary studies.
- Intellectual Property Analysis: Supports patent validity and scope assessments through MD5-based clustering and patent-derived annotations.
Methodology:
Clustering sequences using MD5 checksums into Level-1 (100% identity) and Level-2 (patent-family based) clusters and incorporating annotations extracted from patent documents such as publication number corrections, earliest publication dates, and feature collations.
Topics
Details
- Tool Type:
- web application
- Operating Systems:
- Linux, Windows, Mac
- Added:
- 3/30/2017
- Last Updated:
- 11/25/2024
Operations
Publications
Li W, McWilliam H, de la Torre AR, Grodowski A, Benediktovich I, Goujon M, Nauche S, Lopez R. Non-redundant patent sequence databases with value-added annotations at two levels. Nucleic Acids Research. 2009;38(suppl_1):D52-D56. doi:10.1093/nar/gkp960. PMID:19884134. PMCID:PMC2808894.