GenBank nucleotide sequence database
GenBank provides a comprehensive public repository of nucleotide sequences and associated annotations to support genomics, taxonomy, and sequence analysis.
Key Features:
- Sequence content: Comprehensive collection of publicly available nucleotide sequences, including contributions from nearly 260,000 formally described species.
- Sequence types: Includes individual laboratory submissions, whole-genome shotgun (WGS) projects, and environmental sampling sequences.
- Accession identifiers: Assigns unique accession numbers to each entry for precise tracking and retrieval.
- International synchronization: Performs daily data exchanges with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ).
- Entrez integration: Integrated into the NCBI Entrez retrieval system linking nucleotide and protein sequences with taxonomy, genome mapping, protein structure and domains, and PubMed literature.
- Similarity search: Supports sequence similarity searches via BLAST across GenBank and related databases.
- Release schedule and access: Provides complete bimonthly releases and daily incremental updates available via FTP.
- Submission mechanisms: Accepts submissions via BankIt and the Sequin program.
Scientific Applications:
- Evolutionary studies: Enables comparative analyses of nucleotide sequences for evolutionary research.
- Functional genomics: Supports annotation and analysis of gene and protein function using sequence and protein domain information.
- Taxonomy and species identification: Facilitates taxonomic assignment and species-level analyses across nearly 260,000 described species.
- Sequence similarity searching: Identification of homologs and related sequences using BLAST against GenBank.
- Genome mapping and protein analysis: Integration with genome mapping data and protein structure/domain information for genomic and structural studies.
Methodology:
Submissions are made via BankIt or Sequin, GenBank staff assign unique accession numbers, records are exchanged daily with ENA and DDBJ, data are integrated through the NCBI Entrez system linking sequences, taxonomy, genome mapping, protein structure/domains and PubMed, BLAST provides sequence similarity searches, and complete bimonthly releases plus daily incremental updates are distributed via FTP.
Topics
Collections
Details
- Tool Type:
- web application
- Operating Systems:
- Linux, Windows, Mac
- Added:
- 9/12/2015
- Last Updated:
- 11/25/2024
Operations
Data Inputs & Outputs
Query and retrieval
Inputs
Outputs
Publications
Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Research. 2012;41(D1):D36-D42. doi:10.1093/nar/gks1195. PMID:23193287. PMCID:PMC3531190.