GROK

The field of bioinformatics has been expanding rapidly, and the generation of data produced in deep sequencing (DS) experiments has been increasing exponentially. However, one of the challenges in bioinformatics is the computational analysis of large data volumes and the requirements for flexible analysis approaches. To address this challenge, the researchers present a mathematical formalism based on set algebra for frequently performed operations in DS data analysis that facilitates the translation of biomedical research questions into a language amenable to computational analysis.

With the help of this formalism, the researchers implemented the Genomic Region Operation Kit (GROK) that supports various DS-related operations, such as preprocessing, filtering, file conversion, and sample comparison. Additionally, GROK provides high-level interfaces for R, Python, Lua, and command-line, as well as an extension C++ API. It supports major genomic file formats and allows storing custom genomic regions in efficient data structures such as red-black trees and SQL databases.

To demonstrate the utility of GROK, the researchers characterized the roles of two major transcription factors (TFs) in prostate cancer using data from 10 DS experiments. This study's results highlight GROK's effectiveness in facilitating the analysis of DS data and the identification of important biological signals.

Topic

Genomics;Sequence analysis;ChIP-seq

Detail

  • Operation: Formatting;Filtering

  • Software interface: Library

  • Language: R, Shell, C++, Python, Lua

  • License: BSD license

  • Cost: Free

  • Version name: 1.1.1

  • Credit: -

  • Input: -

  • Output: -

  • Contact: kristian.ovaska@helsinki.fi;sampsa.hautaniemi@helsinki.fi

  • Collection: -

  • Maturity: -

Publications

  • Genomic region operation kit for flexible processing of deep sequencing data.
  • Ovaska K, et al. Genomic region operation kit for flexible processing of deep sequencing data. Genomic region operation kit for flexible processing of deep sequencing data. 2013; 10:200-6. doi: 10.1109/TCBB.2012.170
  • https://doi.org/10.1109/TCBB.2012.170
  • PMID: 23702556
  • PMC: -

Download and documentation


< Back to DB search