Abstract for the talk on 12.11.2020 (17:00 h)Math Machine Learning seminar MPI MIS + UCLA
Yasaman Bahri (Google Brain)
The Large Learning Rate Phase of Wide, Deep Neural Networks
See the video of this talk.
Recent investigations into infinitely-wide deep neural networks have given rise to intriguing connections between deep networks, kernel methods, and Gaussian processes. Nonetheless, there are important dynamical regimes for finite-width neural networks that lie far outside the realm of applicability of these results. I will discuss how the choice of learning rate in gradient descent is a crucial factor that naturally classifies gradient descent dynamics of deep nets into two classes (a “lazy” regime and a “catapult” regime). These phases are separated by a sharp phase transition as deep networks become wider. I will describe the distinct phenomenological signatures of the two phases, how they are elucidated in a class of solvable simple models, and the implications for model performance.