coseq

coseq performs clustering of compositional high-throughput RNA sequencing feature profiles to identify groups of co-expressed genes and other compositional patterns.


Key Features:

  • Clustering of Expression Profiles: Clusters feature profiles derived from high-throughput RNA sequencing, accommodating features with zero or near-zero counts across samples.
  • Adapted Data Transformations: Implements Centered Log Ratio (CLR) and Log Centered Log Ratio (LCL) transformations to address the compositional nature of the data.
  • Clustering Algorithms: Supports mixture models and K-means for grouping transformed feature profiles.
  • Model Selection Criteria: Employs a non-asymptotic penalized criterion with the penalty calibrated by slope heuristics to determine the optimal number of clusters.

Scientific Applications:

  • Gene Expression Analysis: Identifies groups of co-expressed genes from RNA sequencing data, including cases where genes are silent under specific experimental conditions.
  • Bicycle Sharing System Patterns: Detects usage patterns in the Velib' bicycle sharing system in Paris as an example of compositional data analysis beyond genomics.

Methodology:

Applies CLR and LCL transformations to compositional count data, clusters transformed profiles using mixture models or K-means, and selects the number of clusters via a non-asymptotic penalized criterion with penalty calibrated by slope heuristics.

Topics

Collections

Details

License:
GPL-3.0
Cost:
Free of charge
Tool Type:
library
Operating Systems:
Linux, Windows, Mac
Programming Languages:
R
Added:
7/17/2018
Last Updated:
12/10/2018

Operations

Publications

Godichon-Baggioni A, Maugis-Rabusseau C, Rau A. Clustering transformed compositional data using<i>K</i>-means, with applications in gene expression and bicycle sharing system data. Journal of Applied Statistics. 2018;46(1):47-65. doi:10.1080/02664763.2018.1454894.

Funding: - Agence Nationale de la Recherche: ANR-13-JS01-0001-01

Documentation

Downloads

Links