The Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetry-Induced Saddles and Global Minima Manifold
- Berfin Şimşek (Ecole Polytechnique Fédérale de Lausanne (EPFL))
In this talk, I will present some implications of the permutation-symmetry in neural networks on the shape of the loss landscape. I will first introduce a type of saddle point, so-called symmetry-induced saddles, emerging from a particular arrangement of neurons in deep neural networks. Then we will describe the precise geometry of the global minima manifold of overparameterized networks in a teacher-student setting. Counting the possible arrangements of neuron groups inside a neural network, we will give the numbers of symmetry-induced saddle manifolds and the components of the global minima manifold in terms of the student and the teacher widths.
Our analysis shows that overparameterization gradually smoothens the landscape due to a faster scaling of the global minima manifold components than the symmetry-induced saddle manifolds. Yet the landscape exhibits roughness in mildly overparameterized networks; we empirically show that gradient-based training finds a zero-loss solution only for a fraction of initializations in this regime.