Implicit biases of large learning rates in machine learning

Molei Tao (Georgia Institute of Technology)

Live Stream

Abstract

This talk will discuss some nontrivial but often pleasant effects of large learning rates, which are commonly used in machine learning practice for improved empirical performances, but defy traditional theoretical analyses. I will first quantify how large learning rates can help gradient descent in multiscale landscape escape local minima, via chaotic dynamics, which provides an alternative to the commonly known escape mechanism due to noises from stochastic gradients. I will then report how large learning rates provably bias toward flatter minimizers. Several related, perplexing phenomena have been empirically observed recently, including Edge of Stability, loss catapulting, and balancing. I will, for the first time, unify them and explain that they are all algorithmic implicit biases of large learning rates. These results are enabled by a first global convergence proof of gradient descent for certain nonconvex functions without Lipschitz gradient. This theory will also provide understanding of when there will be Edge of Stability and other large learning rate implicit biases.

Links

seminar

14.08.25 02.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

See Details

Upcoming Events of this Seminar

Thursday, 14.08.25 to be announced with Jonathan Siegel
Thursday, 02.10.25 to be announced with Marcello Carioni