SAMtools

SAMtools processes and manipulates high-throughput sequencing alignment data and supports variant calling and statistical analysis for genomic research.


Key Features:

  • File Format Conversion and Manipulation: Supports the Sequence Alignment/Map (SAM) format for storing read alignments and accommodates both short and long reads (up to 128 Mbp).
  • Indexing and Random Access Efficiency: Implements utilities for efficient indexing and random access of SAM files to enable scalable access to large genomic datasets.
  • Variant Calling and Analysis: Includes a variant caller for identifying single nucleotide polymorphisms (SNPs) and other variants directly from sequencing data without requiring explicit genotyping or linkage-based imputation.
  • Statistical Framework for Uncertain Data: Provides statistical methods to analyze sequence data with inherent uncertainty, applicable to multi-sample low-coverage sequencing.
  • Sorting, Querying, and Statistics: Offers functions for sorting alignments, querying datasets, and generating alignment and dataset statistics.
  • Integration into Genomic Pipelines: SAMtools and BCFtools integrate into numerous software projects and genomic analysis pipelines.

Scientific Applications:

  • Genome-Wide Association Studies (GWAS): Used for association mapping to identify genetic variants associated with specific traits or diseases.
  • Population Genetics: Facilitates inference of population-genetic parameters for studies in evolutionary biology and biodiversity.
  • Somatic Mutation Discovery: Employed to discover somatic mutations relevant to cancer research and tumor evolution studies.
  • Next-Generation Sequencing (NGS) Data Analysis: Applied to analyze NGS datasets, including scenarios where genotypes are not readily available.

Methodology:

Performs file format conversion and manipulation, indexing for random access, sorting and querying of alignments, generation of alignment statistics, variant calling, and statistical analysis tailored for uncertain or low-coverage multi-sample sequencing data.

Topics

Collections

Details

License:
MIT
Maturity:
Mature
Cost:
Free of charge
Tool Type:
command-line tool, workflow
Operating Systems:
Linux, Windows, Mac
Programming Languages:
C
Added:
1/13/2017
Last Updated:
6/3/2025

Operations

Data Inputs & Outputs

Publications

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078-2079. doi:10.1093/bioinformatics/btp352. PMID:19505943. PMCID:PMC2723002.

Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10(2). doi:10.1093/gigascience/giab008. PMID:33590861. PMCID:PMC7931819.

PMID: 33590861
PMCID: PMC7931819
Funding: - Wellcome Trust: 206194

Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27(21):2987-2993. doi:10.1093/bioinformatics/btr509. PMID:21903627. PMCID:PMC3198575.

Documentation

Downloads

Links

Related Tools

htslib
Relation: uses