Feature Learning in Shallow Neural Networks: From Theory to Optimization Algorithms

Behrad Moniri (University of Pennsylvania)

Live Stream

Abstract

In this talk, we study the problem of feature learning in shallow neural networks. In the first part of the talk, we review the fundamental limitations of learning using two-layer neural networks with the first-layer weights fixed at initialization -- a model that does not learn features. We demonstrate that, in the high-dimensional proportional limit, this model is unable to learn nonlinear functions, a property known as Gaussian Equivalence. We then show that even a single step of gradient descent applied to the first layer can drastically alter this scenario. Through a precise analysis of the spectrum of the feature matrix, we illustrate how this one-step update breaks Gaussian Equivalence.

In the second part of the talk, we revisit the problem of a two-layer network updated by one gradient descent step and also consider linear representation learning, another widely studied model of feature learning. We establish that in these problems, gradient descent is a suboptimal feature-learning algorithm under general input conditions beyond the typical isotropic assumption. Furthermore, we show that layer-wise preconditioning optimization methods emerge as the natural solution. Thus, we provide the first learning-theoretic motivations for these popular deep learning optimization algorithms.

seminar

14.08.25 02.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

See Details

Upcoming Events of this Seminar

Thursday, 14.08.25 to be announced with Jonathan Siegel
Thursday, 02.10.25 to be announced with Marcello Carioni