Gradient Descent for Deep Neural Networks: New Perspectives from Mean-field and NTK

Marco Mondelli (Institute of Science and Technology Austria)

E1 05 (Leibniz-Saal)

Abstract

Understanding the properties of neural networks trained via gradient descent is at the heart of the theory of deep learning. In this talk, I will discuss two approaches to study the behavior of gradient descent methods. The first one takes a mean-field view and it relates the dynamics of stochastic gradient descent (SGD) to a certain Wasserstein gradient flow in probability space. I will show how this idea allows to study the connectivity, convergence and implicit bias of the solutions found by SGD. The second approach consists in the analysis of the Neural Tangent Kernel. I will present tight bounds on its smallest eigenvalue and show their implications on memorization and optimization in deep networks.

Based on joint work with Adel Javanmard, Vyacheslav Kungurtsev, Andrea Montanari, Guido Montufar, Quynh Nguyen, and Alexander Shevchenko.

Links

seminar

14.08.25 02.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

See Details

Upcoming Events of this Seminar

Thursday, 14.08.25 to be announced with Jonathan Siegel
Thursday, 02.10.25 to be announced with Marcello Carioni