Talk

The effect of low rank and stochasticity on Gradient Descent at the Edge of Stability

  • Avrajit Ghosh (Michigan State University)
  • Soo Min Kwon (University of Michigan)
Live Stream

Abstract

Deep neural networks trained using gradient descent (GD) with a fixed learning rate (lr) often operate in the regime of ''edge of stability’’ (EoS), where the largest eigenvalue of the Hessian equilibrates about the stability threshold (2/lr). This talk aims to uncover the EoS dynamics of deep neural networks and will be split into two parts: (i) dynamics of GD in an overparameterized deep matrix factorization problem and (ii) dynamics of weight-perturbed GD (WPGD) for training deep neural networks. In the first part, we theoretically analyze the unstable, oscillatory dynamics in deep matrix factorization and show that loss oscillations occur within a low-dimensional subspace, whose dimension is precisely characterized by the learning rate. We show that GD with a large step size drives the balancing gap among the layers towards zero, offering insight into its ability to break conservation laws induced by symmetry in deep networks. In the second part, we show that deep neural networks trained with WPGD typically operate at a smaller threshold than 2/lr depending on the variance and number of samples used for the perturbation. Overall, both parts offer insight into the role of the optimizer and the loss landscape in governing the stability dynamics of deep learning.

The talk will be based on the following papers.

Links

seminar
14.08.25 02.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

Upcoming Events of this Seminar