SRA Software Toolkit

SRA Software Toolkit provides programmatic access to sequencing data in the INSDC Sequence Read Archives (SRA) for retrieval and downstream bioinformatics analyses.


Key Features:

  • Data Access and Retrieval: Enables efficient access to and retrieval of large-scale sequence datasets archived in the INSDC SRA.
  • Integration with NCBI Resources: Integrates SRA data with NCBI resources such as GenBank, Entrez, and BLAST for linked annotation and sequence search.
  • Customized Search Capabilities: Supports custom implementations of BLAST optimized to search specialized datasets.
  • SDK and Libraries: Includes an SDK and libraries that support multiple programming languages for programmatic data access and development of custom applications.
  • Large-scale Data Handling: Implements an architecture and framework for handling large-scale sequence data retrieval and processing.

Scientific Applications:

  • Genomic Research: Provides sequence data for comparative genomics and evolutionary biology studies.
  • Transcriptomics and Proteomics: Provides access to RNA-seq and other high-throughput sequencing datasets used in transcriptomics and proteomics.
  • Disease Research: Enables analysis of genetic disease-related datasets, including studies of gene expression and mutation analysis.

Methodology:

Provides libraries that support multiple programming languages for custom scripts and applications, implements a framework and architecture for sequence data retrieval and processing at scale, and supports custom BLAST implementations for dataset-specific searches.

Topics

Collections

Details

Tool Type:
workflow
Operating Systems:
Linux, Windows, Mac
Programming Languages:
Perl
Added:
8/3/2017
Last Updated:
11/25/2024

Operations

Data Inputs & Outputs

Data handling

Publications

Wheeler DL. Database resources of the National Center for Biotechnology Information. Nucleic Acids Research. 2006;34(90001):D173-D180. doi:10.1093/nar/gkj158. PMID:16381840. PMCID:PMC1347520.

Documentation

Links