HTSlib

HTSlib provides programmatic access to and processing of genomic data files including SAM, BAM, CRAM, VCF, BCF and FASTA for high-throughput sequencing analyses.


Key Features:

  • Data Format Support: Supports SAM, BAM and CRAM for alignment data and VCF and BCF for variant data, with interfaces for FASTA and tab-delimited genomic coordinate files.
  • Performance Enhancements: Achieves measured improvements such as a 5× faster BAM read-write loop and 13× faster BAM-to-SAM conversion using 16 threads compared to Samtools 0.1.19.
  • CRAM Support: Provides explicit support for the CRAM file format.
  • Indexing and Iterators: Offers enhanced indexing capabilities and iterator-based region retrieval for efficient data access.
  • Multithreading: Employs improved threading utilization to accelerate multi-threaded operations.
  • Access Protocols: Includes support for newer access protocols for file retrieval and access.
  • Language Integration: Has been incorporated into projects and ecosystems using Perl, Python, Rust, and R.

Scientific Applications:

  • Genome assembly: Enables access to alignment and reference files used during assembly workflows.
  • Variant calling: Provides VCF/BCF handling and indexed region access required for variant discovery and genotyping.
  • Alignment analysis: Facilitates read-level alignment processing and conversion between SAM and BAM formats.
  • Standards interoperability: Implements and supports file formats and access patterns aligned with Global Alliance for Genomics and Health (GA4GH) standards to facilitate cross-tool compatibility.

Methodology:

Implements BAM read-write loops and BAM-to-SAM conversion, supports the CRAM format, provides indexing and iterator-based access, uses multithreading for performance, and supports newer access protocols.

Topics

Collections

Details

License:
MIT
Maturity:
Mature
Cost:
Free of charge
Tool Type:
library
Operating Systems:
Linux, Windows, Mac
Programming Languages:
C
Added:
8/20/2017
Last Updated:
6/3/2025

Operations

Data Inputs & Outputs

Data handling

Publications

Bonfield JK, Marshall J, Danecek P, Li H, Ohan V, Whitwham A, Keane T, Davies RM. HTSlib: C library for reading/writing high-throughput sequencing data. GigaScience. 2021;10(2). doi:10.1093/gigascience/giab007. PMID:33594436. PMCID:PMC7931820.

PMID: 33594436
PMCID: PMC7931820
Funding: - Wellcome Trust: 206194

Documentation

Downloads

Links

Related Tools

bcftools
Relation: usedBy
samtools
Relation: usedBy