Institute
General information about the institute, such as our mission statement, organizational structure, staff directory, history, directions, etc.

See More
Research
Scientific profile with all research groups, topics, collaborations, as well as columns on research at the institute.

See More
News
News and press releases about the institute, as well as a media archive.

See More
Events
Overview of all events around the institute, such as talks, seminars, lectures, workshops, conferences and public events.

See More
Publications
Overview of all scientific publications of the institute, as well as our preprint and software repositories.

See More
Career
Information on open positions at the institute, benefits of working with us, graduate school, and postdoctoral supervision.

See More

Talk

01.04.21, 17:00

Regularization, what is it good for?

Roi Livni (Tel Aviv University)

Live Stream

Abstract

Regularization is considered a key-concept in the explanation and analysis of successful learning algorithms. In contrast, modern machine learning practice often suggests invoking highly expressive models that can completely interpolate the data with far more free parameters than examples. To resolve this alleged contradiction the notion of implicit bias, or implicit regularization, has been suggested as a means to explain the surprising generalization ability of modern-day overparameterized learning algorithms. In this talk, we will revisit this paradigm in one of the most well-studied and well-understood models for theoretical machine learning: Stochastic Convex Optimization (SCO).

We begin by discussing new results that highlight the role of the optimization algorithm for learning. We give a new separation result that separates between the generalization performance of stochastic gradient descent (SGD) and of full-batch gradient descent (GD), as well as regularized GD. We show that while all algorithms optimize the empirical loss at the same rate, their generalization performance can be significantly different. We next discuss the implicit bias of Stochastic Gradient Descent (SGD) in this context and ask if the implicit bias accounts for the success of SGD to generalize. We provide several constructions that point out to significant difficulties in providing a comprehensive explanation of an algorithm's generalization performance by solely arguing about its implicit regularization properties.

On the one hand, these results demonstrate the importance of the optimization algorithm in generalization. On the other hand, they also hint that the reason or cause for the different performances may not necessarily be explained or understood via investigations of the algorithm's bias.

Based on joint works with: Idan Amir, Assaf Dauber, Meir Feder, Tomer Koren.

Links

seminar

02.04.20 16.04.26

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

See Details