The Mechanism Behind the Implicit Biases of Large Learning Rates: Edge of Stability, Balancing, and Catapult

Yuqing Wang (JHU)

Live Stream

Abstract

Large learning rates, when applied to gradient descent for nonconvex optimization, yield various implicit biases, including edge of stability, balancing, and catapult. There are a lot of theoretical works trying to analyze these phenomena, while the high level idea is still missing: it is unclear when and why these phenomena occur. In this talk, I will show that these phenomena are actually various tips of the same iceberg. They occur when the objective function of optimization has some good regularity. This regularity, together with the effect of large learning rate on guiding gradient descent from sharp regions to flatter ones, leads to the control of the largest eigenvalue of Hessian, i.e., sharpness, along the GD trajectory, which results in various phenomena. The result is based on the nontrivial convergence analysis under large learning rate on a family of nonconvex functions of various regularities without Lipschitz gradient which is usually a default assumption in nonconvex optimization. Neural network experiments will also be presented to validate this result.

seminar

12.06.25 02.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

See Details

Upcoming Events of this Seminar

Thursday, 12.06.25 Where Does Mini-Batch SGD Converge? with Pierfrancesco Beneventano
Thursday, 19.06.25 to be announced with Jingfeng Wu
Thursday, 03.07.25 to be announced with Xingyu Zhu
Thursday, 10.07.25 to be announced with Avrajit Ghosh a.o.
Thursday, 14.08.25 to be announced with Jonathan Siegel
Thursday, 02.10.25 to be announced with Marcello Carioni