SNPFile

SNPFile stores and manages large-scale Single Nucleotide Polymorphism (SNP) genotype data and associated metadata such as marker names, chromosomal locations, and individuals' phenotypes for high-throughput genotyping and genome-wide analyses.


Key Features:

  • Binary file format: Uses a novel binary format tailored for high-throughput genotyping data to optimize input/output (I/O) operations.
  • Flexible serialization: Supports serialization of genotype matrices and arbitrary associated metadata including marker names, locations, and phenotypic data.
  • Memory and I/O efficiency: Implements storage and access patterns designed for multi-locus analysis methods to reduce memory footprint and I/O overhead.
  • Scripting interfaces: Provides programmatic interfaces for writing converters between the SNPFile format and other genotype/metadata formats.

Scientific Applications:

  • Genome-wide association studies (GWAS): Stores and retrieves SNP genotypes and linked phenotypic metadata for large-scale association analyses.
  • Multi-locus and large-scale genetic analyses: Facilitates multi-locus analysis workflows by providing efficient access to dense genotype datasets and marker annotations.

Methodology:

Implements a novel binary file format and flexible serialization with access patterns optimized for multi-locus analysis to improve I/O efficiency and reduce memory usage for high-throughput genotyping data.

Topics

Details

License:
GPL-3.0
Tool Type:
command-line tool
Operating Systems:
Linux
Programming Languages:
C++, Python
Added:
12/18/2017
Last Updated:
11/25/2024

Operations

Publications

Nielsen J, Mailund T. SNPFile – A software library and file format for large scale association mapping and population genetics studies. BMC Bioinformatics. 2008;9(1). doi:10.1186/1471-2105-9-526. PMID:19063732. PMCID:PMC2633306.

Documentation

Links