The Probabilistic Stability and Low-Rank Bias of SGD

Liu Ziyin (University of Tokyo)

Live Stream

Abstract

Conventionally, the stability of stochastic gradient descent (SGD) is understood through a linear stability analysis, where the mean and variance of the parameter or the gradients are examined to determine the stability of SGD close to a stationary point. In this seminar, we discuss the limitations of linear stability theories and motivate a new notion of stability, which we call the probabilistic stability. We first explain why this notion of stability is especially suitable for understanding SGD at a large learning rate and a small batch size in toy problems. Then, with this new notion of stability, we study the implicit bias of SGD and show that SGD at a large learning rate converges to low-rank saddles in matrix factorization problems.

The talk is mainly based on the following two works:

[1] Liu Ziyin, Botao Li, James B. Simon, Masahito Ueda. SGD with a Constant Large Learning Rate Can Converge to Local Maxima. ICLR 2022.

[2] The Probabilistic Stability of SGD. (tentative title, in preparation)

Links

seminar

14.08.25 09.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

See Details

Upcoming Events of this Seminar

Thursday, 14.08.25 Topological Aspects of Symmetry-Preserving Neural Networks with Jonathan Siegel
Thursday, 21.08.25 Empirical Bayes Langevin dynamics in the linear model with Zhou Fan
Thursday, 28.08.25 Curvature Tuning: Provable Model Steering From a Single Parameter with Randall Balestriero
Thursday, 02.10.25 to be announced with Marcello Carioni
Thursday, 09.10.25 to be announced with Baharan Mirzasoleiman