anndata

anndata provides a canonical data structure and I/O for annotated data matrices to store and manage observations, variables, and associated metadata for high-dimensional biological datasets such as single-cell RNA sequencing.


Key Features:

  • Annotated Data Matrices: Structured storage for observations (e.g., cells or samples), variables (e.g., genes), and their associated annotations.
  • Sparse Data Support: Efficient handling of sparse data common in single-cell RNA sequencing matrices.
  • Lazy Operations: Support for deferred computations to reduce unnecessary calculation and memory usage.
  • PyTorch Interface: Integration with PyTorch to enable transition from data handling to deep learning model training.
  • In-memory and On-disk Storage: Management of annotated data matrices both in memory and on disk.
  • Interoperability with pandas and xarray: Serves as an intermediary data representation compatible with pandas and xarray data structures.
  • Integration with scikit-learn: Interoperates with modeling packages such as scikit-learn for downstream analyses.
  • Canonical Structure for Learned Representations: Provides a canonical format for book-keeping annotations, learned representations, and task-associated data.

Scientific Applications:

  • Single-cell RNA sequencing analysis: Storage and annotation of single-cell RNA-seq count matrices and metadata.
  • Genomics and single-cell biology: Management of large-scale, high-dimensional biological datasets and their metadata.
  • Representation learning and model training: Support for training models and generating low-dimensional representations, including deep learning workflows with PyTorch.
  • Exploratory data analysis and iterative annotation: Facilitation of iterative workflows that annotate observations and variables via low-dimensional representations.

Methodology:

Provides a canonical data structure for book-keeping annotations, learned representations, and task-associated data, with explicit support for sparse data handling, lazy operations, and interoperability with modeling packages such as scikit-learn and PyTorch.

Details

License:
BSD-3-Clause
Tool Type:
command-line tool
Programming Languages:
Python
Added:
2/23/2024
Last Updated:
11/24/2024

Operations

Publications

Virshup I, Rybakov S, Theis FJ, Angerer P, Wolf FA. anndata: Annotated data. Unknown Journal. 2021. doi:10.1101/2021.12.16.473007.

Documentation