Institute
General information about the institute, such as our mission statement, organizational structure, staff directory, history, directions, etc.

See More
Research
Scientific profile with all research groups, topics, collaborations, as well as columns on research at the institute.

See More
News
News and press releases about the institute, as well as a media archive.

See More
Events
Overview of all events around the institute, such as talks, seminars, lectures, workshops, conferences and public events.

See More
Publications
Overview of all scientific publications of the institute, as well as our preprint and software repositories.

See More
Career
Information on open positions at the institute, benefits of working with us, graduate school, and postdoctoral supervision.

See More

Publications
2020
Issue 63

MiS Preprint Repository

We have decided to discontinue the publication of preprints on our preprint server end of 2024. The publication culture within mathematics has changed so much due to the rise of repositories such as ArXiV (www.arxiv.org) that we are encouraging all institute members to make their preprints available there. An institute's repository in its previous form is, therefore, unnecessary. The preprints published to date will remain available here, but we will not add any new preprints here.

MiS Preprint

63/2020

Implicit bias of gradient descent for mean squared error regression with wide neural networks

Hui Jin and Guido Montúfar

Abstract

We investigate gradient descent training of wide neural networks and the corresponding implicit bias in function space. Focusing on 1D regression, we show that the solution of training a width-n shallow ReLU network is within n^{−1/2} of the function which fits the training data and whose difference from initialization has smallest 2-norm of the second derivative weighted by 1/ζ. The curvature penalty function 1/ζ is expressed in terms of the probability distribution that is utilized to initialize the network parameters, and we compute it explicitly for various common initialization procedures. For instance, asymmetric initialization with a uniform distribution yields a constant curvature penalty, and thence the solution function is the natural cubic spline interpolation of the training data. The statement generalizes to the training trajectories, which in turn are captured by trajectories of spatially adaptive smoothing splines with decreasing regularization strength.

Download full preprint 1 MB

Received:: 12.06.20

Published:: 12.06.20

Keywords:: Implicit bias, Overparametrized Neural Network, cubic spline interpolation, spatially adaptive smoothing spline, effective capacity

Related publications

inJournal

2023 Journal Open Access

Hui Jin and Guido Montúfar

Implicit bias of gradient descent for mean squared error regression with two-layer wide neural networks

In: Journal of machine learning research, 24 (2023), p. 137

BibTex ArXiv: 2006.07356 www.jmlr.org