Effect of Geometrical Singularity on Learning Dynamics of Neural Networks
- Hyeyoung Park (Kyungpook National University)
Owing to the rapid development of big data and computing technology, deep learning method have shown successful results in various application fields. In view of theoretical studies, however, the deep neural networks are still trained by using the error backpropagation method, which has been used for training conventional multilayer perceptron, and its undesirable learning behaviors such as plateau are inherently maintained. Moreover, as the complexity of the model increases, it is likely to that such undesirable phenomena appear in various forms, making it difficult to fully understand their properties. In this talk, I introduce an approach to understand the strange learning behavior of neural networks, focusing on the complex singular structure of the neuromanifold. In order to investigate the effect of singularities on learning dynamics, we define three different types of learning scenarios according to the positional relationship between optimal and singular points. In these scenarios, we trace evolutions of generalization error during learning by using statistical mechanical method, and show that the three learning scenarios have different dynamical properties. Especially, in the near-singular scenario with over-parameterized model, we reveal the quasi-plateau phenomena, which are different type of slow dynamics from the well-known plateau. Through further analysis on average learning equations around the singular points, we show that there are two different types of slow manifolds associated with plateau and quasi-plateau, respectively. Additionally, we also show that the natural gradient learning, which is developed by considering geometrical structure, can alleviate the slow convergence caused by plateaus and quasi-plateaus.