Search

Research Spotlights

Guido Montúfar — Implicit Bias in Wide Neural Networks

Published Jul 19, 2021

We investigate gradient descent training of overparametrized neural networks with rectified linear units and the corresponding implicit bias in function space. For 1D mean squared error regression, the solution found by gradient descent is a function which interpolates the training data and has a small spatially weighted two norm of the second derivative relative to the initial function. The curvature penalty function is expressed in terms of the probability distribution that is utilized to initialize the network parameters, and we compute it explicitly for various common parameter initialization procedures. Based on these results, the training trajectories can be described in function space as trajectories of spatially adaptive smoothing splines with decreasing regularization strength. The results generalize to multivariate regression and different activation functions. This is joint work with Hui Jin.

I agree to the display of external content. This implies that personal data may be transferred to third party platforms.