HTSlib
HTSlib provides programmatic access to and processing of genomic data files including SAM, BAM, CRAM, VCF, BCF and FASTA for high-throughput sequencing analyses.
Key Features:
- Data Format Support: Supports SAM, BAM and CRAM for alignment data and VCF and BCF for variant data, with interfaces for FASTA and tab-delimited genomic coordinate files.
- Performance Enhancements: Achieves measured improvements such as a 5× faster BAM read-write loop and 13× faster BAM-to-SAM conversion using 16 threads compared to Samtools 0.1.19.
- CRAM Support: Provides explicit support for the CRAM file format.
- Indexing and Iterators: Offers enhanced indexing capabilities and iterator-based region retrieval for efficient data access.
- Multithreading: Employs improved threading utilization to accelerate multi-threaded operations.
- Access Protocols: Includes support for newer access protocols for file retrieval and access.
- Language Integration: Has been incorporated into projects and ecosystems using Perl, Python, Rust, and R.
Scientific Applications:
- Genome assembly: Enables access to alignment and reference files used during assembly workflows.
- Variant calling: Provides VCF/BCF handling and indexed region access required for variant discovery and genotyping.
- Alignment analysis: Facilitates read-level alignment processing and conversion between SAM and BAM formats.
- Standards interoperability: Implements and supports file formats and access patterns aligned with Global Alliance for Genomics and Health (GA4GH) standards to facilitate cross-tool compatibility.
Methodology:
Implements BAM read-write loops and BAM-to-SAM conversion, supports the CRAM format, provides indexing and iterator-based access, uses multithreading for performance, and supports newer access protocols.
Topics
Collections
Details
- License:
- MIT
- Maturity:
- Mature
- Cost:
- Free of charge
- Tool Type:
- library
- Operating Systems:
- Linux, Windows, Mac
- Programming Languages:
- C
- Added:
- 8/20/2017
- Last Updated:
- 6/3/2025
Operations
Data Inputs & Outputs
Data handling
Outputs
Publications
Bonfield JK, Marshall J, Danecek P, Li H, Ohan V, Whitwham A, Keane T, Davies RM. HTSlib: C library for reading/writing high-throughput sequencing data. GigaScience. 2021;10(2). doi:10.1093/gigascience/giab007. PMID:33594436. PMCID:PMC7931820.
Documentation
User manual
http://www.htslib.org/doc/#manual-pagesDownloads
- Downloads pagehttp://www.htslib.org/download/
Links
Repository
https://github.com/samtools/htslibMailing list
http://www.htslib.org/support/#listsIssue tracker
https://github.com/samtools/htslib/issues