Implicit bias of gradient descent for mean squared error regression with wide neural networks

Guido Montúfar (Max Planck Institute for Mathematics in the Sciences (Leipzig), Germany)

Plenarsaal Center for Interdisciplinary Research (ZiF), Bielefeld University (Bielefeld)

Abstract

We investigate gradient descent training of wide neural networks and the corresponding implicit bias in function space. For univariate regression, the solution of training a width-n shallow ReLU network is within n^−1/2 of the function which fits the training data and whose difference from the initial function has the smallest 2-norm of the second derivative weighted by a curvature penalty that depends on the probability distribution that is used to initialize the network parameters. We compute the curvature penalty function explicitly for various common initialization procedures. For multivariate regression we show an analogous result, whereby the second derivative is replaced by the Radon transform of a fractional Laplacian. For initialization schemes that yield a constant penalty function, the solutions are polyharmonic splines. Moreover, we obtain results for different activations and show that the training trajectories are captured by trajectories of smoothing splines with decreasing regularization strength.

This is joint work with Hui Jin arxiv.org/abs/2006.07356.

conference

04.08.21 07.08.21

Conference on Mathematics of Machine Learning Conference on Mathematics of Machine Learning

Center for Interdisciplinary Research (ZiF), Bielefeld University Plenarsaal

See Details

Benjamin Gess

Max Planck Institute for Mathematics in the Sciences and Universität Bielefeld

Guido Montúfar

Max Planck Institute for Mathematics in the Sciences and UCLA

Nihat Ay

Hamburg University of Technology