In Riemannian geometry geodesics (up to re-parametrization) are integral curves of the Riemannian distance gradient. We extended this classical result to the framework of Information Geometry. In particular, we obtained that the rays of level-sets defined by a pseudo-distance are generated by the sum of two tangent vectors. By relying on these vectors, we have recently proposed a new definition of canonical divergence and its dual function. This new divergence allows us to recover a given dualisticstructure of a smooth manifold . Additionally, we showed that this divergence reduces to the canonical divergence proposed by Ay and Amari in the case of: (a) self-duality, (b) dual flatness, (c) statistical geometric analogue of the concept of symmetric spaces in Riemannian geometry. As matter of fact, when computed on finite states, in the classical and the quantum setting, the new divergence turns out to be the well-known-divergence.
The mutual information is a fundamental quantity that measures the total amount of dependence between two random variables. Estimating the mutual information from finite data samples can be challenging in practice. In this introductory talk, I will briefly review some trainable neural estimators of the mutual information and discuss an application in relation to the analysis of learning in deep neural networks.
We revisit the natural gradient method for learning. Here we consider the proximal formulation and obtain a closed-form approximation of the proximal term over an affine subspace of functions. We mainly consider the two statistical metrics: the Wasserstein metric, and the Fisher-Rao metric, and we introduce numerical methods for high-dimensional parameter spaces.
In this talk, I will be presenting on several projects that I have taken a part in during my PhD studies. First I will discuss the set of joint probability distributions that maximize multi-information over a collection of margins. This quantity called the "Factorized Mutual Information" (FMI) has been used as a computationally efficient proxy for the global mutual information (MI) in the context of intrinsic rewards in embodied reinforcement learning. A comparison between the FMI maximizers and the MI maximizers will be discussed. Second, I will review recent improvements on the sufficiency bounds for deep stochastic feedforward networks to be universal approximators of Markov kernels. This work can be seen as extending the investigations of the representational power of stochastic networks, such as Deep Belief Networks and Restricted Boltzmann Machines. Last, I will preview some ongoing work relating to the approximation properties of Convolutional Neural Networks. These networks can be viewed as a special families of parameterized piecewise linear functions, and counting the maximum number of linear regions attainable for a ReLU network has previously been used to quantify the network's approximation flexibility.