A Backward Error Analysis for Random Shuffling SGD

  • Samuel L. Smith (DeepMind)
Live Stream


I will discuss two recent works, which apply backward error analysis to describe the influence of finite learning rates on the dynamics of Gradient Descent [1] and Random Shuffling SGD [2]. In particular, I will show that, when using small finite learning rates, the path taken by Random Shuffling SGD coincides with the path taken by gradient flow when minimizing a modified loss function. This modified loss function contains an implicit source of regularization which enhances the test accuracies achieved by deep networks on standard benchmarks.

I will also briefly discuss an empirical study clarifying the key phenomena observed when training deep networks with SGD [3].

[1] Implicit Gradient Regularization, Barrett and Dherin, ICLR 2021

[2] On the Origin of Implicit Regularization in Stochastic Gradient Descent, Smith et al., ICLR 2021

[3] On the Generalization Benefit of Noise in Stochastic Gradient Descent, Smith et al., ICML 2020


01.08.24 22.08.24

Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

Katharina Matschke

MPI for Mathematics in the Sciences Contact via Mail

Upcoming Events of this Seminar