How degenerate is the parametrization of (ReLU) neural networks?
- Dennis Elbrächter (Universität Wien)
Neural network training is usually accomplished by solving a non-convex optimization problem using stochastic gradient descent. Although one optimizes over the networks parameters, the loss function (up to regularization terms) generally only depends on the realization of the neural network, i.e. the function it computes. We discuss how studying the optimization problem over the space of realizations may open up new ways to understand neural network training, if one manages to overcome the difficulties caused by the redundancies and degeneracies of how neural networks are parametrized.