Abstract for the talk on 11.02.2021 (17:00 h)

Math Machine Learning seminar MPI MIS + UCLA

Samuel L. Smith (DeepMind)
A Backward Error Analysis for Random Shuffling SGD
See the video of this talk.

I will discuss two recent works, which apply backward error analysis to describe the influence of finite learning rates on the dynamics of Gradient Descent [1] and Random Shuffling SGD [2]. In particular, I will show that, when using small finite learning rates, the path taken by Random Shuffling SGD coincides with the path taken by gradient flow when minimizing a modified loss function. This modified loss function contains an implicit source of regularization which enhances the test accuracies achieved by deep networks on standard benchmarks.

I will also briefly discuss an empirical study clarifying the key phenomena observed when training deep networks with SGD [3].

[1] Implicit Gradient Regularization, Barrett and Dherin, ICLR 2021

[2] On the Origin of Implicit Regularization in Stochastic Gradient Descent, Smith et al., ICLR 2021

[3] On the Generalization Benefit of Noise in Stochastic Gradient Descent, Smith et al., ICML 2020


18.10.2021, 14:54