Spectral properties of wide neural networks and their implications to the convergence speed of different Gradient Descent algorithms

Maksim Velikanov (Skolkovo Institute of Science and Technology)

Live Stream

Abstract

Training of neural networks is hard to describe theoretically due to complicated non-linear dependence of network predictions on parameters. However, the situation greatly simplifies in the limit of infinite network width, where the problem becomes quadratic with the matrix given by Neural Tangent Kernel (NTK). Such problems are more amenable to theoretical analysis and mostly described by spectral properties of linear operator and target function.

In the first part of the talk, we will show that in certain scenarios spectrum of the NTK and eigendecomposition of target function are asymptotically described by power laws with simple explicit expression for their exponents. In the second part of the talk we will turn to general quadratic problems with power-law spectrum and give tight bounds for convergence speed of various Gradient Descent algorithms: vanilla Gradient Descent (GD), Heavy Ball (HB) method, GD and HB with predefined schedules, Steepest Descent and Conjugate Gradients.

The talk is based on the joint work with Dmitry Yarotsky (arXiv:2105.00507 and arXiv:2202.00992).

seminar

05.06.25 02.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

Details anzeigen

Upcoming Events of this Seminar

Donnerstag, 05.06.25 Complexity of Deciding Injectivity and Surjectivity of ReLU Neural Networks with Moritz Grillo
Donnerstag, 12.06.25 Where Does Mini-Batch SGD Converge? with Pierfrancesco Beneventano
Donnerstag, 19.06.25 to be announced with Jingfeng Wu
Donnerstag, 03.07.25 to be announced with Xingyu Zhu
Donnerstag, 10.07.25 to be announced with Avrajit Ghosh a.o.
Donnerstag, 14.08.25 to be announced with Jonathan Siegel
Donnerstag, 02.10.25 to be announced with Marcello Carioni