Spectral properties of wide neural networks and their implications to the convergence speed of different Gradient Descent algorithms

Maksim Velikanov (Skolkovo Institute of Science and Technology)

Live Stream

Abstract

Training of neural networks is hard to describe theoretically due to complicated non-linear dependence of network predictions on parameters. However, the situation greatly simplifies in the limit of infinite network width, where the problem becomes quadratic with the matrix given by Neural Tangent Kernel (NTK). Such problems are more amenable to theoretical analysis and mostly described by spectral properties of linear operator and target function.

In the first part of the talk, we will show that in certain scenarios spectrum of the NTK and eigendecomposition of target function are asymptotically described by power laws with simple explicit expression for their exponents. In the second part of the talk we will turn to general quadratic problems with power-law spectrum and give tight bounds for convergence speed of various Gradient Descent algorithms: vanilla Gradient Descent (GD), Heavy Ball (HB) method, GD and HB with predefined schedules, Steepest Descent and Conjugate Gradients.

The talk is based on the joint work with Dmitry Yarotsky (arXiv:2105.00507 and arXiv:2202.00992).

seminar

03.07.25 02.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

See Details

Upcoming Events of this Seminar

Thursday, 03.07.25 On the Power of Context-Enhanced Learning in LLMs with Xingyu Zhu
Thursday, 10.07.25 The effect of low rank and stochasticity on Gradient Descent at the Edge of Stability with Avrajit Ghosh a.o.
Thursday, 14.08.25 to be announced with Jonathan Siegel
Thursday, 02.10.25 to be announced with Marcello Carioni