HTSeq

HTSeq provides Python library components for processing high-throughput sequencing (HTS) data, enabling parsing of HTS data formats and the representation and analysis of genomic coordinates, sequences, sequencing reads, alignments, gene model information, and variant calls for downstream analyses such as RNA-Seq differential expression.


Key Features:

  • Data Parsing and Representation: Parsers for numerous common HTS data formats and classes that represent genomic coordinates, sequences, sequencing reads, alignments, gene model information, and variant calls.
  • Querying via Genomic Coordinates: Specialized data structures that enable coordinate-based queries across genomic features.
  • Custom Script Development: A flexible Python framework that supports rapid development of custom scripts for nonstandard analytical workflows.
  • htseq-count Tool: A utility that preprocesses RNA-Seq data for differential expression analysis by counting overlaps of sequencing reads with annotated genes.

Scientific Applications:

  • RNA-Seq Analysis: Preparation of RNA-Seq data for differential expression studies using htseq-count to quantify read overlaps with annotated genes.
  • Genomic Variant Calling: Representation and manipulation of variant calls to support identification and analysis of genetic variation.
  • Custom Genomic Analyses: Development of bespoke analyses leveraging parsers and coordinate-based queries to address specialized genomic research questions.

Methodology:

HTSeq is implemented in Python and provides parsers for common HTS data formats and utilities such as htseq-count, which counts overlaps of sequencing reads with annotated genes.

Topics

Collections

Details

License:
GPL-3.0
Maturity:
Mature
Tool Type:
library
Operating Systems:
Linux, Windows, Mac
Programming Languages:
Python
Added:
1/13/2017
Last Updated:
11/25/2024

Operations

Publications

Anders S, Pyl PT, Huber W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics. 2014;31(2):166-169. doi:10.1093/bioinformatics/btu638. PMID:25260700. PMCID:PMC4287950.

Documentation