The Stochastic Reconfiguration algorithm has been recently proposed to efficiently train Neural-Network Quantum States, i.e., Restricted Boltzmann Machines (RBMs) with complex parameters built to simulate the ground state of a quantum many-body problem. The SR algorithm is not only a convenient algorithm for the training these RBMs, but it is also theoretically justified once a non-Euclidean manifold structure, based on Quantum Information Geometry, is defined over the search space associated to a RBM. I will show that the gradient descent optimization algorithm used for the many-body problem is the quantum generalization of the Riemannian natural gradient introduced by Amari. Moreover, I will compare the geometry of the Neural-Network Quantum States with the one of the Quantum Boltzmann Machine.
Information Geometry is the differential geometric study of the manifold of probability models, and promises to be a unifying geometric framework for investigating statistical inference, information theory, machine learning, etc. Instead of using metric for measuring distances on such manifolds, these applications often use “divergence functions” for measuring proximity of two points (that do not impose symmetry and triangular inequality), for instance Kullback-Leibler divergence, Bregman divergence, f-divergence, etc. Divergence functions are tied to generalized entropy (for instance, Tsallis entropy, Renyi entropy, phi-entropy) and cross-entropy functions widely used in machine learning and information sciences. It turns out that divergence functions enjoy pleasant geometric properties – they induce what is called “statistical structure” on a manifold M: a Riemannian metric g together with a pair of torsion-free affine connections D, D*, such that D and D* are both Codazzi coupled to g while being conjugate to each other. Divergence functions also induce a natural symplectic structure on the product manifold MxM for which M with statistical structure is a Lagrange submanifold. We recently characterize holomorphicity of D, D* in the (para-)Hermitian setting, and show that statistical structures (with torsion-free D, D*) can be enhanced to Kahler or para-Kahler manifolds. The surprisingly rich geometric structures and properties of a statistical manifold open up the intriguing possibility of geometrizing statistical inference, information, and machine learning in string-theoretic languages.
Natural gradient was first introduced by Amari (1998), and it corresponds to the Riemannian gradient of a function defined over a statistical model, evaluated with respect to the Fisher-Rao information metric. The Natural gradient can be efficiently employed for the optimization of likelihood functions, empirical risks, and the expected value of any function defined over the sample space of a statistical model. Natural gradient speeds up the convergence to the optimum of gradient descent methods, however, first-order methods often fail due to slow convergence rates in case of ill-conditioned functions. In this talk we define the alpha-Hessian, a family of Riemannian Hessians computed with respect to the dually-flat Amari-Chentsov alpha-connection, which is at the basis of the geometry of statistical models studied in Information Geometry. The use in optimization of the Riemannian Hessian allows the design of more sophisticated optimization methods, with super-linear convergence rates, such as the well-known Newton method. After presenting the general framework for second-order optimization over statistical manifolds in the exponential family, in the talk we will discuss details of some examples for both continuous and discrete models. The talk is based on joint work with Giovanni Pistone from Collegio Carlo Alberto, IT.
The search for a potential function S allowing to reconstruct a given metric tensor g and a given symmetric covariant tensor T on a manifold M is formulated as the Hamilton-Jacobi problem associated with a canonically deﬁned Lagrangian on TM. The connection between this problem, the geometric structure of the space of pure states of quantum mechanics, and the theory of contrast functions of classical information geometry is outlined.