Learning threshold neurons via the "edge of stability”

Felipe Suárez-Colmenares (MIT)

Live Stream

Abstract

Existing analyses of neural network training often operate under the unrealistic assumption of an extremely small learning rate. This lies in stark contrast to practical wisdom and empirical studies, such as the work of J. Cohen et al. (ICLR 2021), which exhibit startling new phenomena (the "edge of stability" or "unstable convergence") and potential benefits for generalization in the large learning rate regime. Despite a flurry of recent works on this topic, however, the latter effect is still poorly understood. In this paper, we take a step towards understanding genuinely non-convex training dynamics with large learning rates by performing a detailed analysis of gradient descent for simplified models of two-layer neural networks. For these models, we provably establish the edge of stability phenomenon and discover a sharp phase transition for the step size below which the neural network fails to learn "threshold-like" neurons (i.e., neurons with a non-zero first-layer bias). This elucidates one possible mechanism by which the edge of stability can in fact lead to better generalization, as threshold neurons are basic building blocks with useful inductive bias for many tasks.

Links

seminar

14.08.25 09.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

Details anzeigen

Upcoming Events of this Seminar

Donnerstag, 14.08.25 Topological Aspects of Symmetry-Preserving Neural Networks with Jonathan Siegel
Donnerstag, 21.08.25 Empirical Bayes Langevin dynamics in the linear model with Zhou Fan
Donnerstag, 28.08.25 Curvature Tuning: Provable Model Steering From a Single Parameter with Randall Balestriero
Donnerstag, 02.10.25 to be announced with Marcello Carioni
Donnerstag, 09.10.25 to be announced with Baharan Mirzasoleiman