pixy

pixy estimates unbiased nucleotide diversity (π) and divergence (dxy) from Variant Call Format (VCF) data by incorporating invariant sites and correcting for missing genotypes.


Key Features:

  • Handling Missing Data: Corrects bias introduced when missing genotypes in VCFs are assumed to be homozygous for the reference allele.
  • Integration of Invariant Sites: Incorporates invariant sites into calculations to avoid bias from VCFs that omit invariant positions.
  • Unbiased Estimates of π and dxy: Produces unbiased nucleotide diversity (π) and between-population divergence (dxy) estimates regardless of the form or amount of missing data.
  • Input Format: Operates on data encoded in Variant Call Format (VCF), explicitly accounting for the typical omission of invariant sites in VCFs.
  • Validation: Performance has been evaluated using both simulated and empirical datasets.

Scientific Applications:

  • Within-population diversity (π): Accurate estimation of nucleotide diversity within populations from VCF-based variant calls.
  • Between-population divergence (dxy): Accurate estimation of genetic divergence between populations using VCF data that may lack invariant sites.
  • Population genetics and evolutionary studies: Supports analyses of genetic variation and evolutionary processes by providing corrected diversity and divergence metrics.
  • Conservation biology and genomics: Provides reliable diversity estimates relevant to conservation genetics and genomic studies.

Methodology:

Operates on VCF files and employs an algorithm that integrates invariant sites and accounts for missing genotypes to compute π and dxy, with evaluation on simulated and empirical datasets.

Topics

Details

License:
MIT
Programming Languages:
Python
Added:
1/18/2021
Last Updated:
1/23/2021

Operations

Publications

Korunes KL, Samuk K. <i>pixy</i>: Unbiased estimation of nucleotide diversity and divergence in the presence of missing data. Unknown Journal. 2020. doi:10.1101/2020.06.27.175091.

Documentation

Links