The Genome Analysis Toolkit (GATK) is a powerful programming framework designed to facilitate the development of robust and efficient analysis tools for next-generation DNA sequencers. With the increasing volume of data generated by next-generation sequencing projects, such as the 1000 Genomes Project, the need for sophisticated and feature-rich analysis tools has become more apparent. However, the complexity of accessing and manipulating the vast amounts of data generated by these machines has made it difficult for even computationally sophisticated individuals to develop practical analysis tools.

The GATK programming framework provides a structured approach to developing efficient and reliable analysis tools that can simplify the process for developers and analysts alike. The framework is designed using the functional programming philosophy of MapReduce and provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. By separating specific analysis calculations from common data management infrastructure, GATK enables the optimization of the framework for correctness, stability, and CPU and memory efficiency. This allows for distributed and shared memory parallelization, making it an incredibly versatile tool for analyzing large-scale sequencing projects.

One of the key features of GATK is its ability to perform coverage calculations and single nucleotide polymorphism (SNP) calling. Coverage calculators are essential for understanding the depth of coverage of a particular genomic region, and SNP calling is critical for identifying variations in DNA sequences that may be associated with specific traits or diseases. The GATK framework provides robust and scale-tolerant tools for both of these tasks, making it a valuable asset for researchers in the field of genetics.

The capabilities of GATK are further demonstrated by its incorporation into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas. These projects have generated vast amounts of data that require sophisticated analysis tools to extract meaningful insights. The GATK programming framework has enabled developers and analysts to quickly and easily develop efficient and robust analysis tools to analyze this data.


Sequence analysis;Genetic variation;Sequencing;Workflows


  • Operation: Polymorphism detection;Sequence analysis;Genotyping;Statistical calculation

  • Software interface: Suite

  • Language: Java, Python

  • License: Apache License, Version 2.0

  • Cost: Free

  • Version name:

  • Credit: National Human Genome Research Institute, including the Large Scale Sequencing and Analysis of Genomes grant.

  • Input: -

  • Output: -

  • Contact: -

  • Collection: -

  • Maturity: Mature


  • The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.
  • McKenna A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. 2010; 20:1297-303. doi: 10.1101/gr.107524.110
  • PMID: 20644199
  • PMC: PMC2928508

Download and documentation

< Back to DB search