Adaptive Spare PLS for Logistic Regression: Dimension reduction, variable selection and classification
Keywords: “Statistics”, “Dimension reduction”, “Sparse PLS”, “Logistic regression”, “High-dimensional data”, “Classification”
Summary
Since few years, data analysis struggles with statistical issues related to the curse of high dimensionality. In this context, meaning when the number of considered variables is far larger than the number of observations in the sample, standard methods for classification are inappropriate, calling for the development of new methodologies. I will present a new method suitable for classification in the high dimensional case. It uses Sparse Partial Least Squares (Sparse PLS) performing compression and variable selection combined to Ridge penalized logistic regression. In particular, we developed an adaptive version of Sparse PLS to improve the dimension reduction process. I will illustrate the interest of our method by classification results on simulated and real data set, comparing to state-of-the-art approaches. The application focus on genomics where dimensions are huge, and especially on prediction of breast cancer relapse (binary) using gene expression level (quantitative).