Search
Talk

Non-Singularity of the Gradient Descent Map for Neural Networks with Piecewise Analytic Activations

  • Alexandru Craciun (TU Munich)
Live Stream

Abstract

A key assumption underlying convergence guarantees for gradient descent is that the GD map is non-singular, i.e., it preserves sets of measure zero under the operation of taking pre-images. However, this had never been rigorously verified for practical neural networks. We prove it holds for all but a finite number of step-sizes, for any architecture using piecewise analytic activations (ReLU, sigmoid, tanh), across fully connected, convolutional, and attention layers. This validates saddle-point avoidance and stability results in realistic training settings.

Upcoming Events of this Seminar