The geometry of neural networks
- Kathlén Kohn (KTH Royal Institute of Technology)
Abstract
A fundamental goal in the theory of deep learning is to explain why the optimization of the loss function of a neural network does not seem to be affected by the presence of non-global local minima. Even in the case of linear networks, most of the existing literature paints a purely analytical picture of the loss, and provides no explanation as to *why* such architectures exhibit no bad local minima. We explain the intrinsic geometric reasons for this behavior of linear networks.
For neural networks in general, we discuss the neuromanifold, i.e., the space of functions parameterized by a network with a fixed architecture. For instance, the neuromanifold of a linear network is a determinantal variety, a classical object of study in algebraic geometry. We introduce a natural distinction between pure critical points, which only depend on the neuromanifold, and spurious critical points, which arise from the parameterization.
This talk is based on joint work with Matthew Trager and Joan Bruna.