Generalization in Deep Learning Through the Lens of Implicit Rank Minimization

Noam Razin (Tel Aviv University)

Live Stream

Abstract

The mysterious ability of neural networks to generalize is believed to stem from an implicit regularization — a tendency of gradient-based optimization to fit training data with predictors of low “complexity.” Despite vast efforts, a satisfying formalization of this intuition is lacking. In this talk I will present a series of works theoretically analyzing the implicit regularization in matrix and tensor factorizations, known to be equivalent to certain linear and non-linear neural networks, respectively. Through dynamical characterizations I will establish an implicit regularization towards low rank (for corresponding notions of rank), different from any type of norm minimization, in contrast to prior beliefs. I will then discuss implications of this finding to both theory (possible explanation for generalization over natural data) and practice (compression of neural network layers, novel regularization schemes). Overall, our results highlight the potential of ranks to explain and improve generalization in deep learning.

Works covered in this talk were done in collaboration with Asaf Maman and Nadav Cohen.

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

See Details

Upcoming Events of this Seminar

Thursday, 14.08.25 to be announced with Jonathan Siegel
Thursday, 02.10.25 to be announced with Marcello Carioni

Generalization in Deep Learning Through the Lens of Implicit Rank Minimization

Abstract

Links

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

Upcoming Events of this Seminar