Talk

The Mechanism Behind the Implicit Biases of Large Learning Rates: Edge of Stability, Balancing, and Catapult

  • Yuqing Wang (JHU)
Live Stream

Abstract

Large learning rates, when applied to gradient descent for nonconvex optimization, yield various implicit biases, including edge of stability, balancing, and catapult. There are a lot of theoretical works trying to analyze these phenomena, while the high level idea is still missing: it is unclear when and why these phenomena occur. In this talk, I will show that these phenomena are actually various tips of the same iceberg. They occur when the objective function of optimization has some good regularity. This regularity, together with the effect of large learning rate on guiding gradient descent from sharp regions to flatter ones, leads to the control of the largest eigenvalue of Hessian, i.e., sharpness, along the GD trajectory, which results in various phenomena. The result is based on the nontrivial convergence analysis under large learning rate on a family of nonconvex functions of various regularities without Lipschitz gradient which is usually a default assumption in nonconvex optimization. Neural network experiments will also be presented to validate this result.

seminar
12.06.25 02.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

Upcoming Events of this Seminar