Search

MiS Preprint Repository

We have decided to discontinue the publication of preprints on our preprint server as of 1 March 2024. The publication culture within mathematics has changed so much due to the rise of repositories such as ArXiV (www.arxiv.org) that we are encouraging all institute members to make their preprints available there. An institute's repository in its previous form is, therefore, unnecessary. The preprints published to date will remain available here, but we will not add any new preprints here.

MiS Preprint
63/2020

Implicit bias of gradient descent for mean squared error regression with wide neural networks

Hui Jin and Guido Montúfar

Abstract

We investigate gradient descent training of wide neural networks and the corresponding implicit bias in function space. Focusing on 1D regression, we show that the solution of training a width-n shallow ReLU network is within n^{−1/2} of the function which fits the training data and whose difference from initialization has smallest 2-norm of the second derivative weighted by 1/ζ. The curvature penalty function 1/ζ is expressed in terms of the probability distribution that is utilized to initialize the network parameters, and we compute it explicitly for various common initialization procedures. For instance, asymmetric initialization with a uniform distribution yields a constant curvature penalty, and thence the solution function is the natural cubic spline interpolation of the training data. The statement generalizes to the training trajectories, which in turn are captured by trajectories of spatially adaptive smoothing splines with decreasing regularization strength.

Received:
Jun 12, 2020
Published:
Jun 12, 2020
Keywords:
Implicit bias, Overparametrized Neural Network, cubic spline interpolation, spatially adaptive smoothing spline, effective capacity

Related publications

inJournal
2023 Journal Open Access
Hui Jin and Guido Montúfar

Implicit bias of gradient descent for mean squared error regression with wide neural networks

In: Journal of machine learning research, 24 (2023), p. 137