Zusammenfassung für den Vortrag am 03.06.2021 (17:00 Uhr)

Math Machine Learning seminar MPI MIS + UCLA

Stanisław Jastrzębski (Molecule.one and Jagiellonian University)
Reverse-engineering implicit regularization due to large learning rates in deep learning
Siehe auch das Video dieses Vortrages.

The early phase of training of deep neural networks has a dramatic and counterintuitive effect on the local curvature of the loss function. For instance, we found that using a small learning rate does not guarantee stable optimization because the optimization trajectory has a tendency to steer towards regions of the loss surface with increasing local curvature. It is equally surprising that using a small learning rate impacts negatively generalization. I will discuss our journey in understanding these and other phenomena. The focus of the talk will be our mechanistic explanation for how using a large learning rate impacts generalization, which we corroborate by developing an explicit regularizer that reproduces its implicit regularization effects [1,2].

[1] The Break-Even Point on Optimization Trajectories of Deep Neural Networks, Jastrzebski et al, ICLR 2020

[2] Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization, Jastrzebski et al, ICML 2021


18.10.2021, 14:54