Search

Talk

Convergence of gradient descent for learning deep linear neural networks

  • Gabin Maxime Nguegnang (RWTH Aachen University)
Live Stream

Abstract

We study the convergence properties of gradient descent for training deep linear neural networks, i.e., deep matrix factorizations, by extending a previous analysis for the related gradient flow. We show that under suitable conditions on the step sizes gradient descent converges to a critical point of the loss function, i.e., the square loss in this work. Furthermore, we demonstrate that for almost all initializations gradient descent converges to a global minimum in the case of two layers. In the case of three or more layers we show that gradient descent converges to a global minimum on the manifold matrices of some fixed rank, where the rank cannot be determined a priori.

Links

seminar
19.12.24

Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

Katharina Matschke

MPI for Mathematics in the Sciences Contact via Mail

Upcoming Events of this Seminar