Zusammenfassung für den Vortrag am 25.02.2021 (17:00 Uhr)

Math Machine Learning seminar MPI MIS + UCLA

Zhiyuan Li (Princeton University)
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
Siehe auch das Video dieses Vortrages. (MPI MiS internal only)

In this talk, I will first highlight how the behavior of normalized nets when trained by SGD departs from traditional optimization viewpoints in several different ways (e.g. use of exponentially increasing learning rates). Then I will present a formal framework for studying their mathematics via suitable adaptation of the conventional framework namely, modeling SGD-induced training trajectory via a suitable stochastic differential equation (SDE)driven by a Brownian motion. This yields: (a) A new ‘intrinsic learning rate’ parameter that is the product of the normal learning rate and weight decay factor. Analysis of the SDE shows how the effective speed of learning varies and equilibrates over time under the control of intrinsic LR. (b) A challenge – via theory and experiments – to popular belief that good generalization requires large learning rates at the start of training. (c) New experiments, backed by mathematical intuition, suggesting the number of steps to equilibrium (in function space) scales as the inverse of the intrinsic learning rate, as opposed to the exponential time convergence bound implied by SDE analysis. We name it the Fast Equilibrium Conjecture. Finally, I will discuss on the validity of such conventional SDE approximation of SGD.

The talk will be based on the following papers:

Zhiyuan Li, Sanjeev Arora, “An Exponential Learning Rate Schedule for Deep Learning”, ICLR 2020

Zhiyuan Li, Kaifeng Lyu, Sanjeev Arora, “Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate”, NeurIPS 2020

Zhiyuan Li, Sadhika Malladi, Sanjeev Arora, “On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)”


18.10.2021, 14:54