Adaptive Spare PLS for Logistic Regression: Dimension reduction, variable selection and classification

PhD

ABS4NGS

statistics

seminar

Statistics seminar, Laboratoire MAP5, Paris-Descartes University, Paris (France)

Authors

Ghislain Durif

Franck Picard

Sophie Lambert-Lacroix

Published

February 6, 2015

Keywords: “Statistics”, “Dimension reduction”, “Sparse PLS”, “Logistic regression”, “High-dimensional data”, “Classification”

Summary

Since few years, data analysis struggles with statistical issues related to the curse of high dimensionality. In this context, meaning when the number of considered variables is far larger than the number of observations in the sample, standard methods for classification are inappropriate, calling for the development of new methodologies. I will present a new method suitable for classification in the high dimensional case. It uses Sparse Partial Least Squares (Sparse PLS) performing compression and variable selection combined to Ridge penalized logistic regression. In particular, we developed an adaptive version of Sparse PLS to improve the dimension reduction process. I will illustrate the interest of our method by classification results on simulated and real data set, comparing to state-of-the-art approaches. The application focus on genomics where dimensions are huge, and especially on prediction of breast cancer relapse (binary) using gene expression level (quantitative).