Abstract for the talk on 04.08.2020 (11:00 h)

Mathematics of Data Seminar

Marco Mondelli (IST Austria)
Understanding Gradient Descent for Over-parameterized Deep Neural Networks
04.08.2020, 11:00 h, only video broadcast

Training a neural network is a non-convex problem that exhibits spurious and disconnected local minima. Yet, in practice neural networks with millions of parameters are successfully optimized using gradient descent methods. In this talk, I will give some theoretical insights on why this is possible. First, I will show that the combination of stochastic gradient descent and over-parameterization makes the landscape of deep networks approximately connected and, therefore, more favorable to optimization. Then, I will focus on a special case (two-layer network fitting a convex function) and provide a quantitative convergence result by exploiting the displacement convexity of a related Wasserstein gradient flow. Finally, I will go back to deep networks and show that a single wide layer followed by a pyramidal topology suffices to guarantee the global convergence of gradient descent.

[Based on joint work with Adel Javanmard, Andrea Montanari, Quynh Nguyen, and Alexander Shevchenko]

If you want to participate in this video broadcast please register using this special form. The (Zoom) link for the video broadcast will be sent to your email address one day before the seminar.

30.07.2020, 16:11