Alevin

Alevin processes droplet-based 3’ tagged-end single-cell RNA sequencing (scRNA-seq) data to perform barcode detection, read mapping, UMI deduplication with transcript-level constraints, and gene count estimation for accurate single-cell transcript quantification (e.g., 10x Genomics Chromium datasets).


Key Features:

  • Cell Barcode Detection: Identifies and validates cell barcodes to assign reads to individual cells.
  • Cell Barcode Whitelisting: Retains only valid cell barcodes to reduce noise from potential artifacts.
  • Read Mapping: Maps sequencing reads to a reference genome to enable downstream quantification.
  • UMI Deduplication with Transcript-Level Constraints: Implements a deduplication method that applies transcript-level constraints and considers both gene-unique and multimap reads.
  • Gene Count Estimation: Estimates gene expression levels using the advanced UMI handling strategy to improve gene abundance measurements.
  • Support for 3' Tagged-End scRNA-seq (e.g., 10x Genomics Chromium): Tailored algorithms for 3’ tagged-end datasets common to droplet-based platforms such as 10x Genomics Chromium.
  • Performance Efficiency: Operates substantially faster and with reduced memory usage, reported as approximately eight times faster than existing gene quantification methods while using less memory.

Scientific Applications:

  • Single-Cell Transcriptomics: Provides accurate and efficient quantification of gene expression at single-cell resolution to study cellular heterogeneity and function.
  • Developmental Biology: Enables tracking of gene expression changes during development or differentiation processes.
  • Cancer Research: Facilitates identification of rare cell populations and analysis of tumor microenvironments from scRNA-seq data.

Methodology:

Performs cell barcode detection and whitelisting, maps reads to a reference genome, and applies UMI deduplication that uses transcript-level constraints and considers gene-unique and multimap reads.

Topics

Details

License:
GPL-3.0
Maturity:
Mature
Cost:
Free of charge
Tool Type:
command-line tool
Operating Systems:
Linux, Mac
Programming Languages:
C++
Added:
6/20/2019
Last Updated:
6/16/2020

Operations

Publications

Srivastava A, Malik L, Smith T, Sudbery I, Patro R. Alevin efficiently estimates accurate gene abundances from dscRNA-seq data. Genome Biology. 2019;20(1). doi:10.1186/s13059-019-1670-y. PMID:30917859. PMCID:PMC6437997.

PMID: 30917859
PMCID: PMC6437997
Funding: - National Science Foundation: BIO-1564917, CCF-1750472, CNS-1763680 - National Institutes of Health: R01HG009937 - Silicon Valley Community Foundation: 2018-182752

Documentation