Search
Talk

Where Does Mini-Batch SGD Converge?

  • Pierfrancesco Beneventano (MIT)
Live Stream

Abstract

Training neural networks relies on mini-batch gradient methods navigating non-convex objectives containing multiple manifolds of minima, each leading to different real-world performance. Which minima do these algorithms reach, and how do hyper-parameters steer this outcome? First, we show a way in which SGD implicitly regularizes the features learned by neural networks. Next, we show that running SGD without replacement is locally equivalent to taking an extra step along a novel regularizer. Finally, we introduce a method to characterize the convergence points for small linear networks.

seminar
19.06.25 02.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

Upcoming Events of this Seminar