Pegasys
Pegasys provides a modular system to construct and execute biological sequence analysis workflows and to integrate heterogeneous tool outputs for genomic data management and export.
Key Features:
- Modular Design: Supports pair-wise and multiple sequence alignment, ab initio gene prediction, RNA gene detection, and masking repetitive sequences in genomic DNA within an extensible module framework.
- Workflow Creation: Employs an innovative data structure that enables creation and runtime modification of dynamic workflows of sequence analyses.
- Parallel Execution: Executes non-serial dependent analyses in parallel on compute clusters to accelerate data generation and processing.
- Unified Data Model: Stores results from heterogeneous programs within a unified data model to ensure consistency and facilitate integration.
- Data Integration and Export: Exports integrated results into General Feature Format (GFF) and GAME XML for use in GFF-dependent tools or import into the Apollo genome editor.
- Database Management: Utilizes a backend relational database management system with a database application programming interface (API) allowing programmatic access via SQL queries.
Scientific Applications:
- Gene Prediction: Integrates ab initio gene prediction results into workflows and centralized storage for downstream analysis.
- RNA Gene Detection: Supports detection and integration of RNA gene predictions within sequence analysis workflows.
- Repetitive Sequence Analysis: Enables masking and cataloguing of repetitive sequences in genomic DNA and incorporation of results into the unified dataset.
- Sequence Alignment and Comparative Genomics: Facilitates pair-wise and multiple sequence alignment workflows for comparative analyses.
- Large-Scale Genomic Studies: Combines parallel execution on compute clusters and database-backed storage to support high-throughput genomic analyses.
Methodology:
Constructs and executes dynamic workflows, runs non-serial dependent analyses in parallel on compute clusters, stores results in a unified data model within a relational database accessible via a database API (SQL), and exports integrated results in GFF or GAME XML.
Topics
Details
- Tool Type:
- web application
- Operating Systems:
- Linux, Windows, Mac
- Programming Languages:
- Java, C++
- Added:
- 5/2/2017
- Last Updated:
- 12/10/2018
Operations
Publications
Shah SP, et al. Pegasys: software for executing and integrating analyses of biological sequences. BMC Bioinformatics. 2004; 5:40. doi: 10.1186/1471-2105-5-40
PMID: 15096276