Gradient Descent for Deep Neural Networks: New Perspectives from Mean-field and NTK

  • Marco Mondelli (Institute of Science and Technology Austria)
E1 05 (Leibniz-Saal)


Understanding the properties of neural networks trained via gradient descent is at the heart of the theory of deep learning. In this talk, I will discuss two approaches to study the behavior of gradient descent methods. The first one takes a mean-field view and it relates the dynamics of stochastic gradient descent (SGD) to a certain Wasserstein gradient flow in probability space. I will show how this idea allows to study the connectivity, convergence and implicit bias of the solutions found by SGD. The second approach consists in the analysis of the Neural Tangent Kernel. I will present tight bounds on its smallest eigenvalue and show their implications on memorization and optimization in deep networks.

Based on joint work with Adel Javanmard, Vyacheslav Kungurtsev, Andrea Montanari, Guido Montufar, Quynh Nguyen, and Alexander Shevchenko.


18.07.24 22.08.24

Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

Katharina Matschke

MPI for Mathematics in the Sciences Contact via Mail

Upcoming Events of this Seminar