A Fine-Grained Characterization for the Implicit Bias of SGD in Least Square Problems

Jingfeng Wu (Johns Hopkins University)

Live Stream

Abstract

There is an increasing realization that algorithmic inductive biases are central in preventing overfitting; empirically, we often see that machine learning models trained by stochastic gradient descent (SGD) generalize well, even in the overparameterized settings and under little to no explicit regularization. This work considers this issue in arguably the most basic setting: averaged or last-iterate SGD for linear regression in the overparameterized regime. Our main result provides a sharp excess risk bound, stated in terms of the full eigenspectrum of the data covariance matrix, that reveals a bias-variance decomposition characterizing when generalization is possible: (i) the variance bound is characterized in terms of an effective dimension (specific for SGD) and (ii) the bias bound is characterized in terms of the optimal model parameter and how it aligns with the data covariance matrix. Moreover, for SGD with iterate averaging, we prove a matching lower bound (up to constant factors); for last iterate SGD (with decaying stepsizes), we provide a matching lower bound (up to constant factors) when the signal-to-noise ratio is bounded.

The above results have the following additional implications:

(1) We compare the implicit regularization afforded by SGD with the explicit regularization of ridge regression. We find that, up to the logarithmic factors, the generalization performance of SGD is always no worse than that of ridge regression in a wide range of overparameterized problems, and, in fact, could be much better for some problem instances.

(2) We study the effects of transfer learning with pretraining and finetuning in the presence of covariate shift. We find that for a large class of linear regression instances, transfer learning with O(N^2) source data (and scarce or no target data) is as effective as supervised learning with N target data.

References:
arxiv.org/abs/2103.12692
arxiv.org/abs/2108.04552
arxiv.org/abs/2110.06198

Links

seminar

07.08.25 09.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

See Details

Upcoming Events of this Seminar

Thursday, 07.08.25 Efficient compression of neural networks and datasets with Lukas Barth
Thursday, 14.08.25 to be announced with Jonathan Siegel
Thursday, 21.08.25 to be announced with Zhou Fan
Thursday, 28.08.25 to be announced with Randall Balestriero
Thursday, 02.10.25 to be announced with Marcello Carioni
Thursday, 09.10.25 to be announced with Baharan Mirzasoleiman