Institute
General information about the institute, such as our mission statement, organizational structure, staff directory, history, directions, etc.

See More
Research
Scientific profile with all research groups, topics, collaborations, as well as columns on research at the institute.

See More
News
News and press releases about the institute, as well as a media archive.

See More
- News Overview
- Press Releases
Events
Overview of all events around the institute, such as talks, seminars, lectures, workshops, conferences and public events.

See More
Publications
Overview of all scientific publications of the institute, as well as our preprint and software repositories.

See More
Career
Information on open positions at the institute, benefits of working with us, graduate school, and postdoctoral supervision.

See More

Talk

25.02.21, 17:00

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate

Zhiyuan Li (Princeton University)

Live Stream

Abstract

In this talk, I will first highlight how the behavior of normalized nets when trained by SGD departs from traditional optimization viewpoints in several different ways (e.g. use of exponentially increasing learning rates). Then I will present a formal framework for studying their mathematics via suitable adaptation of the conventional framework namely, modeling SGD-induced training trajectory via a suitable stochastic differential equation (SDE)driven by a Brownian motion. This yields: (a) A new ‘intrinsic learning rate’ parameter that is the product of the normal learning rate and weight decay factor. Analysis of the SDE shows how the effective speed of learning varies and equilibrates over time under the control of intrinsic LR. (b) A challenge -- via theory and experiments -- to popular belief that good generalization requires large learning rates at the start of training. (c) New experiments, backed by mathematical intuition, suggesting the number of steps to equilibrium (in function space) scales as the inverse of the intrinsic learning rate, as opposed to the exponential time convergence bound implied by SDE analysis. We name it the Fast Equilibrium Conjecture. Finally, I will discuss on the validity of such conventional SDE approximation of SGD.

The talk will be based on the following papers:

Zhiyuan Li, Sanjeev Arora, “An Exponential Learning Rate Schedule for Deep Learning”, ICLR 2020

Zhiyuan Li, Kaifeng Lyu, Sanjeev Arora, “Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate”, NeurIPS 2020

Zhiyuan Li, Sadhika Malladi, Sanjeev Arora, “On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)”

Links

seminar

01.08.24 22.08.24

Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

See Details

Katharina Matschke

MPI for Mathematics in the Sciences Contact via Mail

Upcoming Events of this Seminar

Thursday, 01.08.24 to be announced with Morgane Austern
Thursday, 08.08.24 to be announced with Pan Lu
Thursday, 15.08.24 Upper and lower memory capacity bounds of transformers for next-token prediction with Liam Madden
Thursday, 22.08.24 to be announced with Guohao Shen