coseq
coseq performs clustering of compositional high-throughput RNA sequencing feature profiles to identify groups of co-expressed genes and other compositional patterns.
Key Features:
- Clustering of Expression Profiles: Clusters feature profiles derived from high-throughput RNA sequencing, accommodating features with zero or near-zero counts across samples.
- Adapted Data Transformations: Implements Centered Log Ratio (CLR) and Log Centered Log Ratio (LCL) transformations to address the compositional nature of the data.
- Clustering Algorithms: Supports mixture models and K-means for grouping transformed feature profiles.
- Model Selection Criteria: Employs a non-asymptotic penalized criterion with the penalty calibrated by slope heuristics to determine the optimal number of clusters.
Scientific Applications:
- Gene Expression Analysis: Identifies groups of co-expressed genes from RNA sequencing data, including cases where genes are silent under specific experimental conditions.
- Bicycle Sharing System Patterns: Detects usage patterns in the Velib' bicycle sharing system in Paris as an example of compositional data analysis beyond genomics.
Methodology:
Applies CLR and LCL transformations to compositional count data, clusters transformed profiles using mixture models or K-means, and selects the number of clusters via a non-asymptotic penalized criterion with penalty calibrated by slope heuristics.
Topics
Collections
Details
- License:
- GPL-3.0
- Cost:
- Free of charge
- Tool Type:
- library
- Operating Systems:
- Linux, Windows, Mac
- Programming Languages:
- R
- Added:
- 7/17/2018
- Last Updated:
- 12/10/2018
Operations
Publications
Godichon-Baggioni A, Maugis-Rabusseau C, Rau A. Clustering transformed compositional data using<i>K</i>-means, with applications in gene expression and bicycle sharing system data. Journal of Applied Statistics. 2018;46(1):47-65. doi:10.1080/02664763.2018.1454894.
Funding: - Agence Nationale de la Recherche: ANR-13-JS01-0001-01