SAMtools
SAMtools processes and manipulates high-throughput sequencing alignment data and supports variant calling and statistical analysis for genomic research.
Key Features:
- File Format Conversion and Manipulation: Supports the Sequence Alignment/Map (SAM) format for storing read alignments and accommodates both short and long reads (up to 128 Mbp).
- Indexing and Random Access Efficiency: Implements utilities for efficient indexing and random access of SAM files to enable scalable access to large genomic datasets.
- Variant Calling and Analysis: Includes a variant caller for identifying single nucleotide polymorphisms (SNPs) and other variants directly from sequencing data without requiring explicit genotyping or linkage-based imputation.
- Statistical Framework for Uncertain Data: Provides statistical methods to analyze sequence data with inherent uncertainty, applicable to multi-sample low-coverage sequencing.
- Sorting, Querying, and Statistics: Offers functions for sorting alignments, querying datasets, and generating alignment and dataset statistics.
- Integration into Genomic Pipelines: SAMtools and BCFtools integrate into numerous software projects and genomic analysis pipelines.
Scientific Applications:
- Genome-Wide Association Studies (GWAS): Used for association mapping to identify genetic variants associated with specific traits or diseases.
- Population Genetics: Facilitates inference of population-genetic parameters for studies in evolutionary biology and biodiversity.
- Somatic Mutation Discovery: Employed to discover somatic mutations relevant to cancer research and tumor evolution studies.
- Next-Generation Sequencing (NGS) Data Analysis: Applied to analyze NGS datasets, including scenarios where genotypes are not readily available.
Methodology:
Performs file format conversion and manipulation, indexing for random access, sorting and querying of alignments, generation of alignment statistics, variant calling, and statistical analysis tailored for uncertain or low-coverage multi-sample sequencing data.
Topics
Collections
Details
- License:
- MIT
- Maturity:
- Mature
- Cost:
- Free of charge
- Tool Type:
- command-line tool, workflow
- Operating Systems:
- Linux, Windows, Mac
- Programming Languages:
- C
- Added:
- 1/13/2017
- Last Updated:
- 6/3/2025
Operations
Data Inputs & Outputs
Data editing
Publications
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078-2079. doi:10.1093/bioinformatics/btp352. PMID:19505943. PMCID:PMC2723002.
Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10(2). doi:10.1093/gigascience/giab008. PMID:33590861. PMCID:PMC7931819.
Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27(21):2987-2993. doi:10.1093/bioinformatics/btr509. PMID:21903627. PMCID:PMC3198575.
Documentation
Downloads
- Downloads pagehttp://www.htslib.org/download/