For better experience, turn on JavaScript!


80 Free Genotype Imputation Tools - Software and Resources

80 Free Genotype Imputation Tools - Software and Resources


Genotype imputation software tools play an essential role in genetics research. They are used to fill in missing genetic information in a dataset, especially when working with extensive datasets where some genetic information is missing. The genotype imputation software tools use a reference panel of genetic data to impute missing genotypes in a given dataset. The reference panel typically contains genetic information from a large and diverse set of individuals, allowing for the imputation of missing data with high accuracy.

Although the process of imputing missing genetic data can be complex, the genotype imputation software tools have made it much more manageable. They are designed to be user-friendly, and the results obtained are reliable and accurate. These software tools have been widely used in genetics research and have demonstrated high accuracy in imputing missing genetic data.


Take a look at our tutorial on Genotype Imputation


Each software tool has its strengths and weaknesses, and researchers can choose the one that best suits their research needs. See the list below.


What are the top 10 most popular genotype imputation tools?

The top ten most popular imputation tools (Date: 2022-10-01 by citation count):

RankTool NameCitationsYear
1MaCH14002010
2BEAGLE14002007
3fastPHASE12002006
4SHAPEIT8542011
5Sanger Imputation Service6412016
6IMPUTE25922011
7Minimac34392016
8Minimac44392016
9SNP2HLA3222013
10IMPUTE42672018

See also for more selection on various topics our Database of Bioinformatics Software Tools and Resources. We are continously curating it and adding tools.

  1. ABHgenotypeR
    • Description : ABHgenotypeR is an R tool for genotype imputation, error correction, and plotting genotype data. The purpose of ABHgenotypeR is to work as an in-between tool for TASSEL GBS pipeline and qtl tool. However, ABHgenotypeR is an independent tool and can visualize genotypes using ggplot2.
  2. Adapt-Mix
    • Description : A tool for genotype imputation in arbitrary population data. The Adapt-Mix algorithm computes estimates of local correlation structure by using a combination of information in all available reference panels and summary statistics-based methods.
  3. ADDIT
    • Description : ADDIT (Accurate Data-Driven Imputation Technique) is a tool for genotype imputation. The ADDIT algorithm consists of two data-driven methods that can manage data from both model and non-model organisms. The model version of the algorithm uses statistical inference, and the non-model version employs supervised learning.
  4. alleHap
    • Description : alleHap is a set of tools for simulation of alleles, genotype imputation, and reconstruction of non-recombinant haplotypes.
  5. AlphaImpute
    • Description : A tool for genotype imputation and phasing. The AlphaImpute algorithm requires pedigree data and couples the long-range phasing with a segregation analysis and haplotype library imputation (SAHLI). The tool is available via email from the Authors.
  6. AlphaPlantImpute
    • Description : A tool for phasing and genotype imputation in plant data. The AlphaPlantImpute algorithm works in and across bi-parental populations. Available from the Author through email. See 'contact'.
  7. ALRA
    • Description : ALRA (Adaptively-thresholded Low Rank Approximation) is a tool for genotype imputation. The ALRA algorithm uses a low-rank approximation method to capture expressed gene dropouts. ALRA is also integrated into Seurat v3.0.
  8. AutoImpute
    • Description : AutoImpute is an R tool for genotype imputation. The AutoImpute algorithm avoids dropouts by autoencoder-based sparse gene expression matrix imputation method and learning a distribution of sing-cell expression data distributions. Requires: Python 2.7, numpy, scikit-learn, TensorFlow, and matplotlib.
  9. BEAGLE
    • Description : Beagle is a tool for genotype calling, phasing, identity-by-descent segment detection, and genotype imputation. The Beagle algorithm uses a modified version of the Li and Stephens haplotype frequency model that reduces the space requirements and a pre-processing step that re-computes an original reference panel into a composite reference haplotypes. These steps reduce both space and computing time.
  10. BIMBAM
    • Description : BIMBAM (Bayesian Imputation Based Association Mapping) is a tool to impute genotypes and perform statistical tests for disease association, such as single-SNP tests and regional multi-SNP tests. The BIMBAM algorithm uses the Bayesian framework.
  11. BLIMP
    • Description : BLIMP (Software for Best Linear IMPutation) is a tool for genotype imputation. The BLIMP algorithm works on pooled or summary data and uses chained equations to handle incomplete categorical variables.
  12. BLUP
    • Description : A tool for genotype imputation. The BLUP algorithm aims to solve the problem of case-control association with missing data. It assumes the samples to contain related individuals.
  13. CGDSNPdb
    • Description : CGDSNPdb is a web-based tool for imputed mouse single nucleotide polymorphism (SNP).
  14. chipimputation
    • Description : chipimputation is a pipeline tool for genotype imputation. The chipimputation pipeline consists of Perl and Python scripts and incorporates the following software tools: PLINK, Shapeit, Impute2, and gtool.
  15. DeepImpute
    • Description : DeepImpute is a tool to impute single-cell RNA-seq (scRNA-seq) data. The DeepImpute algorithm uses a deep neural network learning method.
  16. DISSCO
    • Description : DISSCO (Direct Imputation of Summary Statistics allowing COvariates) is a tool for genotype imputation. The DISSCO algorithm uses association summary statistics, and thus do not require individual-level genotype data.
  17. DIST
    • Description : DIST (DirectImputation of summarySTatistics) is a tool for genotype imputation. The DIST algorithm imputes the summary statistics of untyped variants, using conditional expectation for multivariate normal variates and correlation in a reference population.
  18. DISTMIX
    • Description : DISTMIX is a tool for genotype imputation. The DISTMIX algorithm extends the capability of DIST (see 'links') by analysis of mixed ethnicity cohorts. It uses a reference panel to impute missing SNPs estimated or in specified ethnic proportions.
  19. DrImpute
    • Description : DrImpute is a tool for the imputation of dropout events in single-cell RNA-seq (sc-RNA-seq) data.
  20. EM-LRT
    • Description : A tool for genotype imputation. The EM-LRT algorithm produces imputation uncertainty.
  21. EMINIM
    • Description : A tool for genotype imputation. The EMINIM algorithm is based on a hidden Markov model (HMM), estimates population parameters from data, and works on diverse model organisms.
  22. Ezimputer
    • Description : EZImputer is a workflow for genotype imputation based on impute2 (see 'links'). It automates steps routinely needed in an imputation scheme.
  23. FAPI
    • Description : A tool for genotype imputation. The FAPI algorithm comprises functions for p-value imputation, meta-analysis, and quality assessment. It does not require phasing or to sample raw genotypes.
  24. fastPHASE
    • Description : A tool for genotype imputation and estimating missing haplotypes. The fastPHASE algorithm obtains a random sample from a population data and models the genealogy of chromosomes and summarizes the haplotype variation. fastPHASE estimates and corrects genotyping errors based on linkage disequilibrium (LD) patterns, associates haplotypes with binary phenotypes, and works on low-coverage sequencing data.
  25. FImpute
    • Description : FImpute is a tool for haplotype estimation or phasing and genotype imputation. The FImpute algorithm uses pedigree information and an iterative procedure and imputes missing genotypes using a sliding window method with the assumption that all subjects have some degree of relationship.
  26. findhap.f90
    • Description : findhap.f90 is a tool for genotype imputation and haplotype detection. The findhap.f90 algorithm uses allele read counts to improve imputation accuracy. The Authors claim the findhap.f90 to be more accurate than Beagle (v4) and up 400 times faster.
  27. FISH
    • Description : A tool for genotype imputation. The FISH algorithm uses a hidden Markov model to characterize single reference haplotypes.
  28. GeneImp
    • Description : A tool for genotype imputation. The GeneImp algorithm uses a sliding window approach and does not require pre-phasing. Furthermore, the algorithm imputes genotypes to a dense reference panel by obtaining the likelihoods from ultralow sequencing coverage. Requirements: VCFtools, bcftools, and HTSlib.
  29. genipe
    • Description : A pipeline tool for genotype imputation. The genipe algorithm includes imputed data indexing, data management, Sequence Kernel Association Test, Cox proportional hazards for survival analysis, linear mixed models for repeated measurements in longitudinal studies. The imputation pipeline works with PLINK, SHAPEIT, and IMPUTE2.
  30. GIGI
    • Description : A tool for rare variant genotype imputation. The GIGI algorithm can handle large pedigrees. Only a subset of individuals in a pedigree needs to be completely sequenced, GIGI will infer the missing genotypes at untyped markers, if the remaining individuals are sequenced solely at appropriate marker locations. GIGI-Quick can speed up the computation by running GIGI in parallel. See 'links'.
  31. GIGI-Quick
    • Description : A tool for genotype imputation that can handle large pedigrees. The GIGI-Quick algorithm runs GIGI (see 'links') in parallel to reduce the overall run time.
  32. GIGSEA
    • Description : GIGSEA (Genotype Imputed Gene Set Enrichment Analysis) is a tool to analyze imputed genotypes. The GIGSEA algorithm uses a combination of genome-wide association study (GWAS) summary statistics and eQTL to deduce differential gene expression and to examine enrichment for gene sets.
  33. Gimpute
    • Description : A pipeline tool for genotype imputation The Gimpute algorithm comprises of genetic variant updating, matching, liftover, quality control, alignment of variants to references, pre-phasing, imputation, and post-imputation quality control.
  34. GRIMM
    • Description : A tool to impute human leukocyte antigen (HLA) genotypes and matching. The GRIMM algorithm uses a graph-based method to store haplotype frequencies.
  35. GTOOL
    • Description : A tool to transform genotype datasets for use with SNPTEST and IMPUTE.
  36. Hap-seqX
    • Description : A tool to haplotype phasing and genotype imputation. The Hap-seqX Algorithm uses a combination of Dynamic Programming and a hidden Markov Model.
  37. HIBAG
    • Description : A tool for imputation of human leukocyte antigen (HLA) types using single nucleotide polymorphisms (SNPs). The HIBAG algorithm consolidates attribute bagging and an ensemble classifier methods, with haplotype inference for SNPs and HLA types.
  38. HLA*IMP
    • Description : A tool for genotype imputation of human leukocyte antigen (HLA) alleles. The HLA*IMP algorithm uses linked SPN data, prepares local data, performs probabilistic imputation through a remote server, and QC.
  39. HLA-IMPUTER
    • Description : A web-based tool for HLA allele imputation using HIBAG algorithm (see 'links'). The HLA-IMPUTER currently has the following reference panels: Han Chinese, Pan Asian, European, and multiethnic.
  40. hsphase
    • Description : hsphase is a tool to detect recombination events, phasing, genotype imputation, and to reconstruct pedigrees. The hsphase algorithm uses a genetic data structure within half-sib livestock to classify recombination events. It can also run directly on sequence data.
  41. Human Protein Variant Effect Map Imputation Toolkit
    • Description : A web-based pipeline tool for genotype imputation and visualization of missense variant effect maps. The algorithm imputes lacking data in empirically observed effect maps.
  42. ImpG-Summary
    • Description : A tool for genotype imputation using. The ImpG-Summary algorithm uses Gaussian imputation with summary association statistics.
  43. Imputability Database
    • Description : A web-based tool provides information on single nucleotide (SNP) and insertion and deletion (indel) imputability. It produces the information given IDs of variants or by specifying a genomics region.
  44. IMPUTE2
    • Description : IMPUTE2 is a tool for genotype imputation and haplotype phasing. The IMPUTE2 algorithm uses 'pre-phasing' wherein it makes initial statistical estimates of the haplotypes. In the next step, it imputes missing genotypes given the estimated haplotypes. This approach yields reduced computing time.
  45. IMPUTE4
    • Description : IMPUTE4 is a tool for genotype imputation. The IMPUTE4 is an improved version of IMPUTE2 (see 'links') and Jonathan Marchini to impute genotype for the UK Biobank data.
  46. IMPUTOR
    • Description : IMPUTOR is a tool to identify miscalled bases caused by sequencing errors and to impute genotypes. The IMPUTOR algorithm imputes erroneously called bases and missing data using a parsimony approach.
  47. Kinpute
    • Description : A tool to compute reference panels and genotype probabilities for specific studies. The Kinpute algorithm uses initial estimates of average identity by descent in a sample to select an optimal set of individuals to sequence for a sample-specific reference panel. The probabilities are useful as an input for genotype imputation software.
  48. LinkImpute
    • Description : A tool for genotype imputation for non-model organisms. The LinkImpute algorithm works on unphased data from heterozygous species. It uses the k-nearest neighbor genotype imputation technique (LD-kNNi) and does not require physical or genetic maps. LinkImpute is available in two versions, Java and as an R package. Other names: LinkImputeR.
  49. LinkImputeR
    • Description : LinkImputeR is a Java tool to call and impute genotypes. The LinkImputeR algorithm uses the read count information and all other available sequence information. LinkImputeR works particularly in non-model organisms because it does not need genotype reference panels or ordered markers.
  50. MaCH
    • Description : MaCH is a tool for genotype imputation and haplotyping using WGS sequence data. The MaCH algorithm uses a Markov chain approach and represents sampled chromosomes as mosaics of each other.
  51. MaCH-Admix
    • Description : MaCH-admix is a tool for genotype imputation, having added features compared to the MaCH 1.0 tool. The MaCH-admix algorithm does piecewise selection of a reference to tailor-make it fit for a target person. Also, the algorithm allows the use of standard reference panels and independent, calibrated parameters by separating imputation itself from parameter estimation.
  52. Mendel Impute
    • Description : A tool for genotype imputation. The MENDEL-IMPUTE algorithm uses matrix completion and a sliding window approach over a single nucleotide polymorphism (SNP). The download package contains documentation.
  53. mendel-gpu
    • Description : A tool for genotype imputation. The mendel-gpu algorithm uses linkage disequilibrium patterns in unrelated subjects and runs in AMD and Nvidia GPUs.
  54. Michigan Imputation Server
    • Description : A web-based tool for genotype imputation. The Michigan Imputation Server supports the following reference panels: 1. HapMap Release 2, 2. 1000 Genomes Phase 1, 3. 1000 Genomes Phase 3, 4. CAAPA African American, 5. Haplotype Reference Consortium, 6. Hosting your own reference panels. The Michigan Imputation Server is open source and the source code is available for download.
  55. Minimac
    • Description : Minimac is a tool for genotype imputation. The Minimac Algorithm is a computationally efficient implementation of MaCH algorithm, works on phased genotypes and can handle large reference panels up to hundreds of thousands of haplotypes.
  56. Minimac2
    • Description : Minimac2 is a tool for genotype imputation. The Minimac2 Algorithm is a computationally efficient implementation of MaCH algorithm, works on phased genotypes and can handle large reference panels up to hundreds of thousands of haplotypes. A multiprocessor version, minimac2-omp is available from the download page.
  57. Minimac3
    • Description : Minimac3 is a tool for genotype imputation, an improved version of Minimac2. The Minimac Algorithm is a computationally efficient implementation of MaCH algorithm, works on phased genotypes and can handle large reference panels up to hundreds of thousands of haplotypes. It first identifies repeated haplotype patterns and uses these to speed up the computational process. Minimac3 uses less memory than its predecessors.
  58. Minimac4
    • Description : Minimac4 is a tool for genotype imputation, an improved version of Minimac2. The Minimac Algorithm is a computationally efficient implementation of MaCH algorithm, works on phased genotypes and can handle large reference panels up to hundreds of thousands of haplotypes. It first identifies repeated haplotype patterns and uses these to speed up the computational process. Main improvements from previous versions: 1. About six times faster for the reference panels than Minimac3. 2. Decreased memory usage. 3. It can use varying ploidy in the same VCF file for imputation of sex chromosomes.
  59. Molgenis-impute
    • Description : A pipeline tool for genotype imputation for the grid and local cluster environments. The automation includes the following steps: genome build liftover, genotype phasing (SHAPEIT2), quality control, sample, and chromosomal chunking, and genotype imputation (IMPUTE). Molgenis-impute utilizes MOLGENIS-compute (see 'links') for submission and monitoring of tasks. Requires: wget or curl, tar, unzip, bunzip2, g++, java 1.6 or higher, python 2.7, and numpy.
  60. NPUTE
    • Description : NPUTE is a tool for genotype imputation. The NPUTE algorithm uses the K-nearest-neighbor (KNN) over any size of sliding haplotype window method, a mismatch accumulator array (MAA). NPUTE also estimates the imputation accuracy by inference of known SNP values when left out.
  61. ParaHaplo
    • Description : ParaHaplo is a tool for genotype imputation and reconstruction of haplotypes. The ParaHaplo algorithm uses parallel computing.
  62. PedImpute
    • Description : A tool for haplotype reconstruction and genotype imputation using whole-genome single-nucleotide reference panels.
  63. PlantImpute
    • Description : PlantImpute is a tool for genotype imputation. The PlantImpute algorithm uses a hidden Markov model (HMM) to track inheritance specifically in plants. The tool is available from Carl Nettelblad by email. See 'contact'.
  64. polyHap
    • Description : polyHap is a tool to phase and estimate missing genotypes in copy number variable (CNV) regions. The polyHap algorithm uses a hidden Markov model (HMM).
  65. PRIMAL
    • Description : PRIMAL (PedigRee IMputation ALgorithm) is a tool for genotype imputation for founder populations having pedigree data. The PRIMAL algorithm is based on an indexing procedure of Identity-By-Descent segments using clique graphs.
  66. PWT
    • Description : A tool for genotype imputation and phasing. The BWT algorithm uses a compressed representation of haplotypes and is based on Positional Burrows-Wheeler Transform (PBWT) compression.
  67. r2_hat
    • Description : r2_hat is a tool to estimate the quality of imputation based on dosage data.
  68. RegionalHapMapExtractor
    • Description : Software to extract a region from hapMapII for MaCH imputation.
  69. Sanger Imputation Service
    • Description : Sanger genotype imputation and phasing service is a web-based tool at Wellcome Sanger Institute. The service pipeline uses EAGLE2 or SHAPEIT2 for pre-phasing, EAGLE2 for phasing, and PBWT (Positional Burrows-Wheeler Transform) for genotype imputation. The service currently offers the following reference panels: 1. Haplotype Reference Consortium, 2. African Genome Resources, 3. 1000 Genomes Phase 3, 4. UK10K, 5. UK10K + 1000 Genomes Phase 3.
  70. SHAPEIT
    • Description : SHAPEIT (Segmented HAPlotype Estimation and Imputation Tool) is a tool to estimate haplotypes. The SHAPEIT algorithm uses data from unrelated individuals or small families and scales linearly with he number of single-nucleotide polymorphisms (SNPs) and haplotypes.
  71. Simpute
    • Description : Is a tool for genotype imputation. The Simpute algorithm does not require reference panels and works by evaluating two neighboring SNP loci around a missing target. It combines the estimated haplotype probabilities with LD data to predict the missing SNP genotype.
  72. simuRare
    • Description : A tool for genotype imputation. The simuRare algorithm uses an aggregate of a logistic regression-based imputation and resampling to simulate rare and common single nucleotide polymorphism (SNP).
  73. SNP2HLA
    • Description : Impute amino acid polymorphisms and single nucleotide polymorphisms in human luekocyte antigenes (HLA) within the major histocompatibility complex (MHC) region in chromosome 6.
  74. SparRec
    • Description : A tool for genotype imputation. The SparRec (SPARse RECovery) algorithm uses low-rank matrix completion with a novel co-clustering factorization.
  75. STITCH
    • Description : A tool for genotype imputation. The STITCH algorithm works without reference panels and models chromosomes as a mosaic of unknown founders or ancestral haplotypes.
  76. tagIMPUTE
    • Description : tagIMPUTE is a tool to impute untyped single-nucleotide polymorphism (SNPs). The tagIMPUTE algorithm uses flanking SNPs that can predict the SNP for imputation.
  77. TagIt
    • Description : TagIt is a tool to select single nucleotide polymorphism (SNP) from 26 population reference panels to increase the accuray of genotype imputation.
  78. TIGAR
    • Description : TIGAR (Transcriptome-Intergrated Genetic Association Resource) is a tool for genotype imputation of transcriptome data. The TIGAR algorithm uses a nonparametric Bayesian method.
  79. trio
    • Description : trio package contains the following functions: 1. The identification of linkage disequilibrium (LD) blocks, 2. Computation of pair-wise LD values. 3. Genotype imputation, 4. Simulation of case-parent trios with disease risk vs. SNP interaction, 5. Computation of trio logic regression on matched case pseudo-control genotype data for case-parent trios, 6. Calculation of power and sample size.
  80. YHap
    • Description : YHap is a command-line tool for prediction of Y-chromosome genotypes and assignment of haplogroups. The YHap algorithm uses an imputation framework and works with low coverage sequence data, less than 2X coverage.







If you find errors, please report here: comments and suggestions.