Regularization, what is it good for?
- Roi Livni (Tel Aviv University)
Regularization is considered a key-concept in the explanation and analysis of successful learning algorithms. In contrast, modern machine learning practice often suggests invoking highly expressive models that can completely interpolate the data with far more free parameters than examples. To resolve this alleged contradiction the notion of implicit bias, or implicit regularization, has been suggested as a means to explain the surprising generalization ability of modern-day overparameterized learning algorithms. In this talk, we will revisit this paradigm in one of the most well-studied and well-understood models for theoretical machine learning: Stochastic Convex Optimization (SCO).
We begin by discussing new results that highlight the role of the optimization algorithm for learning. We give a new separation result that separates between the generalization performance of stochastic gradient descent (SGD) and of full-batch gradient descent (GD), as well as regularized GD. We show that while all algorithms optimize the empirical loss at the same rate, their generalization performance can be significantly different. We next discuss the implicit bias of Stochastic Gradient Descent (SGD) in this context and ask if the implicit bias accounts for the success of SGD to generalize. We provide several constructions that point out to significant difficulties in providing a comprehensive explanation of an algorithm's generalization performance by solely arguing about its implicit regularization properties.
On the one hand, these results demonstrate the importance of the optimization algorithm in generalization. On the other hand, they also hint that the reason or cause for the different performances may not necessarily be explained or understood via investigations of the algorithm's bias.
Based on joint works with: Idan Amir, Assaf Dauber, Meir Feder, Tomer Koren.