The Large Learning Rate Phase of Wide, Deep Neural Networks

Yasaman Bahri (Google Brain)

Live Stream

Abstract

Recent investigations into infinitely-wide deep neural networks have given rise to intriguing connections between deep networks, kernel methods, and Gaussian processes. Nonetheless, there are important dynamical regimes for finite-width neural networks that lie far outside the realm of applicability of these results. I will discuss how the choice of learning rate in gradient descent is a crucial factor that naturally classifies gradient descent dynamics of deep nets into two classes (a “lazy” regime and a “catapult” regime). These phases are separated by a sharp phase transition as deep networks become wider. I will describe the distinct phenomenological signatures of the two phases, how they are elucidated in a class of solvable simple models, and the implications for model performance.

Links

seminar

07.08.25 09.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

See Details

Upcoming Events of this Seminar

Thursday, 07.08.25 Efficient compression of neural networks and datasets with Lukas Barth
Thursday, 14.08.25 to be announced with Jonathan Siegel
Thursday, 21.08.25 to be announced with Zhou Fan
Thursday, 28.08.25 to be announced with Randall Balestriero
Thursday, 02.10.25 to be announced with Marcello Carioni
Thursday, 09.10.25 to be announced with Baharan Mirzasoleiman