Mildly Overparameterized ReLU Networks Have a Favorable Loss Landscape
We study the loss landscape of two-layer mildly overparameterized ReLU neural networks on a generic finite input dataset for the squared error loss. Our approach involves bounding the dimension of the sets of local and global minima using the rank of the Jacobian of the parametrization map. Using results on random binary matrices, we show most activation patterns correspond to parameter regions with no bad differentiable local minima. Furthermore, for one-dimensional input data, we show most activation regions realizable by the network contain a high dimensional set of global minima and no bad local minima. We experimentally confirm these results by finding a phase transition from most regions having full rank to many regions having deficient rank depending on the amount of overparameterization.
This is work with Kedar Karhadkar, Michael Murray, Hanna Tseran.
Bio: Guido Montúfar is an Associate Professor of Mathematics and Statistics & Data Science at UCLA as well as the Leader of the Math Machine Learning Group at the Max Planck Institute for Mathematics in the Sciences. His research focuses on deep learning theory and more generally mathematical aspects of machine learning. He studied mathematics and physics at TU Berlin, obtained the Dr.rer.nat. in 2012 as an IMPRS fellow in Leipzig, and held postdoc positions at PennState and MPI MiS. Guido Montufar is a 2022 Alfred P. Sloan Research Fellow.