Statistical efficiency and optimization of deep learning from the viewpoint of non-convexity
- Taiji Suzuki (The University of Tokyo, and Center for Advanced Intelligence Project, RIKEN, Tokyo)
In this talk, I discuss how deep learning can statistically outperform shallow methods such as kernel methods utilizing the notion of sparsity of a target function space, and present a non-convex optimization framework with a generalization and excess risk bounds. In the first half, I will summarize our recent work on the excess risk bounds of deep learning in the Besov space and its variants. It will be shown that the superiority of deep learning stems from sparsity of the target function space, and more essentially non-convex geometry of the space characterizes this property. In such a situation, deep learning can achieve the so-called adaptive estimation which gives a better excess risk than shallow methods. In the latter half, I present a deep learning optimization framework based on a noisy gradient descent in infinite dimensional Hilbert space (gradient Langevin dynamics), and show generalization error and excess risk bounds for the solution obtained by the optimization procedure. The proposed framework can deal with finite and infinite width networks simultaneously unlike existing one such as neural tangent kernel and mean field analysis.