Implicit bias of gradient descent for mean squared error regression with wide neural networks
Hui Jin and Guido Montúfar
Contact the author: Please use for correspondence this email.
Submission date: 12. Jun. 2020
Keywords and phrases: Implicit bias, Overparametrized Neural Network, cubic spline interpolation, spatially adaptive smoothing spline, effective capacity
Download full preprint: PDF (1349 kB)
We investigate gradient descent training of wide neural networks and the corresponding implicit bias in function space. Focusing on 1D regression, we show that the solution of training a width-n shallow ReLU network is within nˆ1/2 of the function which ﬁts the training data and whose diﬀerence from initialization has smallest 2-norm of the second derivative weighted by 1/. The curvature penalty function 1/ is expressed in terms of the probability distribution that is utilized to initialize the network parameters, and we compute it explicitly for various common initialization procedures. For instance, asymmetric initialization with a uniform distribution yields a constant curvature penalty, and thence the solution function is the natural cubic spline interpolation of the training data. The statement generalizes to the training trajectories, which in turn are captured by trajectories of spatially adaptive smoothing splines with decreasing regularization strength.