SNPFile
SNPFile stores and manages large-scale Single Nucleotide Polymorphism (SNP) genotype data and associated metadata such as marker names, chromosomal locations, and individuals' phenotypes for high-throughput genotyping and genome-wide analyses.
Key Features:
- Binary file format: Uses a novel binary format tailored for high-throughput genotyping data to optimize input/output (I/O) operations.
- Flexible serialization: Supports serialization of genotype matrices and arbitrary associated metadata including marker names, locations, and phenotypic data.
- Memory and I/O efficiency: Implements storage and access patterns designed for multi-locus analysis methods to reduce memory footprint and I/O overhead.
- Scripting interfaces: Provides programmatic interfaces for writing converters between the SNPFile format and other genotype/metadata formats.
Scientific Applications:
- Genome-wide association studies (GWAS): Stores and retrieves SNP genotypes and linked phenotypic metadata for large-scale association analyses.
- Multi-locus and large-scale genetic analyses: Facilitates multi-locus analysis workflows by providing efficient access to dense genotype datasets and marker annotations.
Methodology:
Implements a novel binary file format and flexible serialization with access patterns optimized for multi-locus analysis to improve I/O efficiency and reduce memory usage for high-throughput genotyping data.
Topics
Details
- License:
- GPL-3.0
- Tool Type:
- command-line tool
- Operating Systems:
- Linux
- Programming Languages:
- C++, Python
- Added:
- 12/18/2017
- Last Updated:
- 11/25/2024
Operations
Publications
Nielsen J, Mailund T. SNPFile – A software library and file format for large scale association mapping and population genetics studies. BMC Bioinformatics. 2008;9(1). doi:10.1186/1471-2105-9-526. PMID:19063732. PMCID:PMC2633306.