anndata
anndata provides a canonical data structure and I/O for annotated data matrices to store and manage observations, variables, and associated metadata for high-dimensional biological datasets such as single-cell RNA sequencing.
Key Features:
- Annotated Data Matrices: Structured storage for observations (e.g., cells or samples), variables (e.g., genes), and their associated annotations.
- Sparse Data Support: Efficient handling of sparse data common in single-cell RNA sequencing matrices.
- Lazy Operations: Support for deferred computations to reduce unnecessary calculation and memory usage.
- PyTorch Interface: Integration with PyTorch to enable transition from data handling to deep learning model training.
- In-memory and On-disk Storage: Management of annotated data matrices both in memory and on disk.
- Interoperability with pandas and xarray: Serves as an intermediary data representation compatible with pandas and xarray data structures.
- Integration with scikit-learn: Interoperates with modeling packages such as scikit-learn for downstream analyses.
- Canonical Structure for Learned Representations: Provides a canonical format for book-keeping annotations, learned representations, and task-associated data.
Scientific Applications:
- Single-cell RNA sequencing analysis: Storage and annotation of single-cell RNA-seq count matrices and metadata.
- Genomics and single-cell biology: Management of large-scale, high-dimensional biological datasets and their metadata.
- Representation learning and model training: Support for training models and generating low-dimensional representations, including deep learning workflows with PyTorch.
- Exploratory data analysis and iterative annotation: Facilitation of iterative workflows that annotate observations and variables via low-dimensional representations.
Methodology:
Provides a canonical data structure for book-keeping annotations, learned representations, and task-associated data, with explicit support for sparse data handling, lazy operations, and interoperability with modeling packages such as scikit-learn and PyTorch.
Details
- License:
- BSD-3-Clause
- Tool Type:
- command-line tool
- Programming Languages:
- Python
- Added:
- 2/23/2024
- Last Updated:
- 11/24/2024
Operations
Publications
Virshup I, Rybakov S, Theis FJ, Angerer P, Wolf FA. anndata: Annotated data. Unknown Journal. 2021. doi:10.1101/2021.12.16.473007.