From Natural Gradient to Riemannian Hessian: Second-order Optimization over Statistical Manifolds
- Luigi Malagò (Romanian Institute of Science and Technology - RIST, Romania)
Natural gradient was first introduced by Amari (1998), and it corresponds to the Riemannian gradient of a function defined over a statistical model, evaluated with respect to the Fisher-Rao information metric. The Natural gradient can be efficiently employed for the optimization of likelihood functions, empirical risks, and the expected value of any function defined over the sample space of a statistical model. Natural gradient speeds up the convergence to the optimum of gradient descent methods, however, first-order methods often fail due to slow convergence rates in case of ill-conditioned functions. In this talk we define the alpha-Hessian, a family of Riemannian Hessians computed with respect to the dually-flat Amari-Chentsov alpha-connection, which is at the basis of the geometry of statistical models studied in Information Geometry. The use in optimization of the Riemannian Hessian allows the design of more sophisticated optimization methods, with super-linear convergence rates, such as the well-known Newton method. After presenting the general framework for second-order optimization over statistical manifolds in the exponential family, in the talk we will discuss details of some examples for both continuous and discrete models. The talk is based on joint work with Giovanni Pistone from Collegio Carlo Alberto, IT.