GeneBase
GeneBase 1.1 parses, structures, indexes, and computes summary statistics from records downloaded from the NCBI Gene database to quantify gene, transcript, exon, intron, coding sequence (CDS), and untranslated region (UTR) features.
Key Features:
- NCBI Gene parsing and indexing: Parses NCBI Gene records and structures and indexes gene, transcript, and feature entries locally.
- Summary statistics: Computes median, mean, standard deviation, total, and extreme values for genes, transcripts, exons, introns, CDS, and UTRs.
- Dynamic calculation capabilities: Provides on-demand calculation and summarization of quantitative parameters for genes, transcripts, and gene features.
- Targeted gene-set analysis: Enables analysis of gene structure parameters following targeted searches for sets of genes with specified characteristics.
- Updated human gene statistics: Reports revised human nuclear gene statistics, e.g., a mean human protein-coding gene length of ≈67 kbp, eleven exons averaging 309 bp, and ten introns averaging 6,355 bp.
Scientific Applications:
- Gene structure analysis: Quantitative characterization of gene, transcript, exon, intron, CDS, and UTR features across gene sets.
- Hypothesis testing: Statistical support for testing biological hypotheses about gene architecture and feature distributions.
- Human genome reference statistics: Generation of updated reference values for human nuclear genes, transcripts, and genomic features.
Methodology:
Locally parses NCBI Gene records, structures and indexes gene/transcript/feature entries, and computes summary statistics (median, mean, standard deviation, totals, and extreme values) for exons, introns, coding sequences, and untranslated regions.
Topics
Details
- License:
- Apache-2.0
- Tool Type:
- desktop application
- Operating Systems:
- Windows, Mac
- Programming Languages:
- Python
- Added:
- 10/3/2018
- Last Updated:
- 7/30/2019
Operations
Publications
Piovesan A, Caracausi M, Antonaros F, Pelleri MC, Vitale L. GeneBase 1.1: a tool to summarize data from NCBI gene datasets and its application to an update of human gene statistics. Database. 2016;2016:baw153. doi:10.1093/database/baw153. PMID:28025344. PMCID:PMC5199132.