Abstract for the talk on 02.07.2020 (17:00 h)Math Machine Learning seminar MPI MIS + UCLA
Wenda Zhou (Columbia University)
New perspectives on cross-validation
See the video of this talk.
Cross-validation is the most widely used method for risk estimation in machine learning and statistics. However, analyzing it and comparing it to the data splitting estimator has proved difficult. In the first part of the talk, I will present a new analysis which characterizes the exact asymptotic of cross-validation in the form of a central limit theorem for estimators which satisfy certain stability conditions. In particular, parametric estimators automatically satisfy these conditions, and the theorems characterize the cross-validated risk for such estimators fully. I will demonstrate that they exhibit a wide variety of behaviours: in the case of a parametric empirical risk minimizer, the folds behave as if independent if the evaluation loss is the same as the training loss. However, if a surrogate loss is used, different behaviours may occur. In the second part, I will move on to discuss issues which arise when using cross-validation for high-dimensional estimators: in the regime where the number of parameters is comparable to the number of observations, cross-validation (and data splitting) may introduce serious bias in the estimate of the risk when the amount of data left out is high (i.e. the number of folds is low). A natural approach may thus be to alleviate this problem by leaving out as little data as possible: a single observation, leading to leave-one-out cross-validation (LOOCV). I will show that indeed, such a result holds and the LOOCV estimator is consistent in the high-dimensional asymptotic. Unfortunately, the LOOCV estimator is computationally prohibitive, and cannot be used in practice. Finally, I will discuss a general framework, approximate LOOCV, from which closed-formed approximate estimators can be derived for penalized GLMs, including non-smooth ones such as the LASSO or SVMs.
Asymptotics of cross-validation, M. Austern and WZ, preprint, arxiv.org/abs/2001.11111
Error bounds in estimating the out-of-sample prediction error using leave-one-out cross validation in high-dimensions, KR Rad, WZ, A. Maleki, AISTATS 2020, arxiv.org/abs/2003.01770
Approximate Leave-One-Out for Fast Parameter Tuning in High Dimensions, S. Wang, WZ, H. Lu, V. Mirrokni, A. Maleki, ICML 2018, arxiv.org/abs/1810.02716