Towards Understanding Training Dynamics for Mildly Overparametrized Models

Rong Ge (Duke University)

Live Stream

Abstract

While over-parameterization is widely believed to be crucial for the success of optimization for the neural networks, most existing theories on over-parameterization do not fully explain the reason -- they either work in the Neural Tangent Kernel regime where neurons don't move much, or require an enormous number of neurons. In this talk I will describe our recent works towards understanding training dynamics that go beyond kernel regimes with only polynomially many neurons (mildly overparametrized). In particular, we first give a local convergence result for mildly overparametrized two-layer networks. We then analyze the global training dynamics for a related overparametrized tensor model. For both works, we rely on a key intuition that neurons in overparametrized models work in groups and it's important to understand the behavior of an average neuron in the group. Based on two works: arxiv.org/abs/2102.02410 and arxiv.org/abs/2106.06573

Links

seminar

07.08.25 09.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

Details anzeigen

Upcoming Events of this Seminar

Donnerstag, 07.08.25 Efficient compression of neural networks and datasets with Lukas Barth
Donnerstag, 14.08.25 to be announced with Jonathan Siegel
Donnerstag, 21.08.25 to be announced with Zhou Fan
Donnerstag, 28.08.25 to be announced with Randall Balestriero
Donnerstag, 02.10.25 to be announced with Marcello Carioni
Donnerstag, 09.10.25 to be announced with Baharan Mirzasoleiman