Abstract for the talk on 03.06.2021 (17:00 h)Math Machine Learning seminar MPI MIS + UCLA
Stanisław Jastrzębski (Molecule.one and Jagiellonian University)
Reverse-engineering implicit regularization due to large learning rates in deep learning
See the video of this talk.
The early phase of training of deep neural networks has a dramatic and counterintuitive effect on the local curvature of the loss function. For instance, we found that using a small learning rate does not guarantee stable optimization because the optimization trajectory has a tendency to steer towards regions of the loss surface with increasing local curvature. It is equally surprising that using a small learning rate impacts negatively generalization. I will discuss our journey in understanding these and other phenomena. The focus of the talk will be our mechanistic explanation for how using a large learning rate impacts generalization, which we corroborate by developing an explicit regularizer that reproduces its implicit regularization effects [1,2].
 The Break-Even Point on Optimization Trajectories of Deep Neural Networks, Jastrzebski et al, ICLR 2020
 Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization, Jastrzebski et al, ICML 2021