RNA-seq

Probabilistic Count Matrix Factorization for Single Cell Expression Data Analysis

The development of high throughput single-cell technologies now allows the investigation of the genome-wide diversity of transcription at different scopes. First, the gene-to-gene variability (expression dynamics) can be quantified more accurately, thanks to the measurement of lowly-expressed genes. Second, the cell-to-cell variability is high, with a low proportion of cells expressing the same gene at the same time/level. Those emerging patterns appear to be very challenging from the statistical point of view, especially to represent and to provide a summarized view of single-cell expression data like single-cell RNA-seq data.

Probabilistic Count Matrix Factorization for Single Cell Expression Data Analysis

The development of high throughput single-cell technologies now allows the investigation of the genome-wide diversity of transcription at different scopes. First, the gene-to-gene variability (expression dynamics) can be quantified more accurately, thanks to the measurement of lowly-expressed genes. Second, the cell-to-cell variability is high, with a low proportion of cells expressing the same gene at the same time/level. Those emerging patterns appear to be very challenging from the statistical point of view, especially to represent and to provide a summarized view of single-cell expression data like single-cell RNA-seq data.

Sparsity and dimension reduction

In the era of large-scale (huge sample size) and/or high-dimensional (numerous variables/features) data, the question of data exploration and representation is central. A wide range of frameworks in statistics and machine learning are now available to solve supervised and unsupervised problems despite the data dimension and complexity. In particular, we will discuss sparsity in the context of dimension reduction, focusing on variable or feature selection and latent space projection. The presentation will be illustrated by various sparse methods designed for data visualization, regression or classification of high-dimensional data, and different examples of genomic data analysis.

Probabilistic Count Matrix Factorization for Single Cell Expression Data Analysis

Motivation: The development of high throughput single-cell sequencing technologies now allows the investigation of the population diversity of cellular transcriptomes. The expression dynamics (gene-to-gene variability) can be quantified more …

pCMF

Probabilistic count matrix factorization for single cell transcriptomic data analyses (dimension reduction, visualization).

plsgenomics

Supervised methods for dimension reduction in classification and regression framework (in particular PLS-based routines for genomic data analyses).

High Dimensional Classification with combined Adaptive Sparse PLS and Logistic Regression

The high dimensionality of genomic data calls for the development of specific classification methodologies, especially to prevent over-optimistic predictions. This challenge can be tackled by compression and variable selection, which can be combined to constitute a powerful framework for classification, as well as data visualization and interpretation. However, current proposed combinations lead to unstable and non convergent methods due to inappropriate computational frameworks. We hereby propose a computationally stable and convergent approach for classification in high dimensional based on sparse Partial Least Squares (sparse PLS).

High Dimensional Classification with combined Adaptive Sparse PLS and Logistic Regression

The high dimensionality of genomic data calls for the development of specific classification methodologies, especially to prevent over-optimistic predictions. This challenge can be tackled by compression and variable selection, which can be combined to constitute a powerful framework for classification, as well as data visualization and interpretation. However, current proposed combinations lead to unstable and non convergent methods due to inappropriate computational frameworks. We hereby propose a computationally stable and convergent approach for classification in high dimensional based on sparse Partial Least Squares (sparse PLS).

Probabilistic Count Matrix Factorization for Single Cell Expression Data Analysis

The development of high throughput single-cell sequencing technologies now allows the investigation of the population level diversity of cellular transcriptomes. This diversity has shown two faces. First, the expression dynamics (gene to gene variability) can be quantified more accurately, thanks to the measurement of lowly-expressed genes. Second, the cell-to-cell variability is high, with a low proportion of cells expressing the same gene at the same time/level. Those emerging patterns appear to be very challenging from the statistical point of view, especially to represent and to provide a summarized view of single-cell expression data.

Probabilistic Count Matrix Factorization for Single Cell Expression Data Analysis

The development of high throughput single-cell sequencing technologies now allows the investigation of the population level diversity of cellular transcriptomes. This diversity has shown two faces. First, the expression dynamics (gene to gene …