Adaptive Sparse PLS for Logistic Regression

PhD
ABS4NGS
statistics
conference
Journées du GDR Stat et Santé, Paris Descartes University, Paris (France)
Authors

Ghislain Durif

Franck Picard

Sophie Lambert-Lacroix

Published

June 12, 2015

Keywords: Statistics, Dimension reduction, Sparse PLS, Logistic regression, High-dimensional data, Classification, Gene expression, RNA-seq

Summary

For a few years, data analysis has been struggling with statistical issues related to the “curse of high dimensionality”. In this context, i.e. when the number of considered variables is far larger than the number of observations in the sample, standard methods of classification are inappropriate, thus calling for the development of new methodologies. I will present a new method suitable for classification in the high dimensional case. It uses Sparse Partial Least Squares (Sparse PLS) performing compression and variable selection combined to Ridge penalized logistic regression. In particular, we have developed an adaptive version of Sparse PLS to improve the dimension reduction process. I will illustrate the interest of our method by classification results on simulated and real data set, comparing to state-of-the-art approaches. The application focus on genomics where dimensions are huge, and especially on prediction of breast cancer relapse (binary) using gene expression level (quantitative). Eventually, our approach is implemented in the plsgenomics R-package.