Institut
Allgemeine Informationen über das Institut wie z.B. Leitbild, Organisationsstruktur, Mitarbeiterverzeichnis, Geschichte, Anfahrtsbeschreibung usw.

Mehr anzeigen
Forschung
Wissenschaftliches Profil mit allen Forschungsgruppen, Themen, Projekten sowie weiteren Forschungsrubriken des Instituts.

Mehr anzeigen
News
Nachrichten und Pressemitteilungen des Instituts, sowie ein Pressearchiv.

Mehr anzeigen
Veranstaltungen
Übersicht über alle Veranstaltungen rund um das Institut, wie Vorträge, Seminare, Vorlesungen, Workshops, Konferenzen und öffentliche Veranstaltungen.

Mehr anzeigen
Publikationen
Übersicht über alle wissenschaftlichen Veröffentlichungen des Instituts sowie über unsere Preprint- und Software-Archive.

Mehr anzeigen
Karriere
Informationen zu Stellenangeboten am Institut, Karriere Benefits, dem Graduiertenkolleg und der Betreuung von Postdocs.

Mehr anzeigen

Talk

13.10.22, 17:00

Implicit bias of optimization algorithms for neural networks: static and dynamic perspectives

Chao Ma (Stanford University)

Live Stream

Abstract

Modern neural networks are usually over-parameterized—the number of parameters exceeds the number of training data. In this case the loss functions tend to have many (or even infinite) global minima, which imposes an additional challenge of minima selection on optimization algorithms besides the convergence. Specifically, when training a neural network, the algorithm not only has to find a global minimum, but also needs to select minima with good generalization among many other bad ones. In this talk, I will share a series of works studying the mechanisms that facilitate global minima selection of optimization algorithms. First, with a linear stability theory, we show that stochastic gradient descent (SGD) favors flat and uniform global minima. Then, we build a theoretical connection of flatness and generalization performance based on a special structure of neural networks. Next, we study the global minima selection dynamics—the process that an optimizer leaves bad minima for good ones—in two settings. For a manifold of minima around which the loss function grows quadratically, we derive effective exploration dynamics on the manifold for SGD and Adam, using a quasistatic approach. For a manifold of minima around which the loss function grows subquadratically, we study the behavior and effective dynamics for GD, which also explains the edge of stability phenomenon.

Links

seminar

02.04.20 16.04.26

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

Details anzeigen