Zusammenfassung für den Vortrag am 04.08.2020 (11:00 Uhr)

Mathematics of Data Seminar

Marco Mondelli (IST Austria)
Understanding Gradient Descent for Over-parameterized Deep Neural Networks
04.08.2020, 11:00 Uhr,nur Video-Broadcast

Training a neural network is a non-convex problem that exhibits spurious and disconnected local minima. Yet, in practice neural networks with millions of parameters are successfully optimized using gradient descent methods. In this talk, I will give some theoretical insights on why this is possible. First, I will show that the combination of stochastic gradient descent and over-parameterization makes the landscape of deep networks approximately connected and, therefore, more favorable to optimization. Then, I will focus on a special case (two-layer network fitting a convex function) and provide a quantitative convergence result by exploiting the displacement convexity of a related Wasserstein gradient flow. Finally, I will go back to deep networks and show that a single wide layer followed by a pyramidal topology suffices to guarantee the global convergence of gradient descent.

[Based on joint work with Adel Javanmard, Andrea Montanari, Quynh Nguyen, and Alexander Shevchenko]

Wenn Sie an diesem Videoseminar teilnehmen möchten, registrieren Sie sich bitte auf diesem Formular. Der (Zoom) Link zur Teilnahme an dem Videoseminar wird Ihnen einen Tag vorher per Email zugeschickt.

30.07.2020, 16:11