A new area where the application of Information Geometric techniques could contribute beneficially is discussed in this work. It concerns the optimization of environmental prediction systems whose added value, in today’s competitive scientific and operational environment, is constantly increasing.
The main approach towards accurate environmental forecasting products is today the use of numerical atmospheric and wave prediction systems. Such platforms simulate successfully the general environmental conditions on global or intermediate scale. However, systematic errors emerge when focusing on local characteristics, mainly as a result of the inability to capture sub‐scale phenomena.
In this framework the use of assimilation systems – aiming at the improvement of initial conditions – and statistical post processes ‐ for the local adaptation of the forecasts ‐ provide significant support. However, in the majority of such techniques, the available data are recognized as elements of Euclidean spaces and least square methods are employed for the estimation‐optimization of their distances.
In the present work the use of Information Geometry in this framework is discussed trying to estimate:
- The statistical manifold to which the environmental data better fit.
- The way that the corresponding geodesics could be estimated.
In particular, the results of a state‐of‐the art wave prediction model (WAM) have been utilized in an area of significant importance: The North Atlantic Ocean.
The obtained outcomes, apart from the clear view that they provide for the wave climate and for the performance of the numerical model, could be exploited for designing new optimization techniques for wave prediction systems that cannot be supported by classical statistics.
Many important constructions and definitions in Evolutionary Game Theory can be elegantly interpreted in terms of concepts from information geometry. Information divergences produce new families of replicator-like dynamics, yielding global Lyapunov functions in conjunction with evolutionary stability and analogs of important local results, such as Fisher's fundamental theorem. Using the generalized information divergences recently developed in Statistical Thermodynamics, we can formulate replicator dynamics with type-dependent intensities of selection and many other variations, including the orthogonal projection dynamic. If space permits, I will discuss connections to Bayesian inference and the concept of potential information.
Some of this work is discussed on my blog (<link http: www.marcallenharper.com blog external>www.marcallenharper.com/blog/) and in my papers on the ArXiv. My collaborator Chris Lee at UCLA and I hope to publish results relevant to inference in the near future. I am also applying this work in connection with the evolutionary ecology group at UCLA, using Fisher information dynamics to investigate population dynamics. I think this will yield an interesting poster concerning applications of information geometry in evolutionary dynamics and inference.
In a parametric statistical inference, the maximum likelihood method is typically used for parameter estimation, and it is nicely explained by the dual differential geometrical structure of a statistical model with Fisher metric and e, m-connections. A good property of this structure is that both of these affine connections are torsion-free, which leads to the important notion of dually flat space. However, this property does not necessarily hold, once we move to another estimation method and consider its associated geometry. In this poster presentation, we investigate a dual differential geometrical structure induced from general estimating functions by utilizing the theory of statistical manifolds admitting torsion, which has been recently developed by Kurose and Matsuzoe. In particular, we focus on the quasi-score (or quasi-likelihood) method and consider the role of the induced geometrical structure in the statistical inference.
Finite-dimensional quantum channel is a map defined on positive matrices which describes the quantum process from input quantum state (probability density matrix) to output state. It is a fundamental task to identify quantum channel statistically. For this purpose, quantum SLD Fisher information of a channel is important.
This notion is closely related with a fiber structure constructed by operator-sum representations of a given channel and its horizontal lifting.
We provide a sufficient condition that a derivative of an operator-sum representation becomes horizontal. As an example, estimation of unknown correlation between several unitary processes satisfying some condition is given.
This is a joint work with Chiara Macchiavello and is supported by CORNER project: <link http: qurope.eu projects corner external>qurope.eu/projects/corner
MPI for biological cybernetics, Tuebingen, Germany
(joint work with Povilas Daniusis, Joris Moiij, Bastian Steudel, Jakob Zscheischler, Kun Zhang, Bernhard Schölkopf)
Inferring causal directions via information geometry
Some recent methods in causal inference use the assumption that the marginal distribution of the cause and the conditional distribution of the effect, given the cause are independent objects ob nature. In other words, the causal hypothesis ``X causes Y'' needs to be rejected if P(X) and P(Y|X) satisfy a relation that ``generic'' pairs (P(X), P(Y|X)) would not satisfy. To formalize such a vague statement is challenging, but we propose a notion of independence that can be interpreted as orthogonality in information space in the sense of information geometry. We developed a method for inferring causal directions based on such an orthogonality assumption and obtained encouraging empirical results.
The problem of maximization of the information divergence from the exponential families symmetrical w.r.t. some permutation group is presented. This work is related to previous results of N. Ay in , F. Matúš in [2,3,4] and J. Rauh in .
Situations, in which there exist maximizers exchangeable w.r.t. the permutation group, are studied. Then, the dimensionality of the optimization problem is highly reduced.
For some special cases explicit solutions are found.
Maximization of information divergence from an exponential family has emerged in probabilistic models for evolution and learning in neural networks that are based on infomax principles. The maximizers admit interpretation as stochastic systems with high complexity w.r.t. exponential family .
A link between divergence maximization and secret sharing was established in .
Keywords. Kullback-Leibler divergence, relative entropy, exponential family, hierarchical models, information projection, symmetric group, permutation group.
AMS 2000 Math. Subject Classification. Primary 94A17. Secondary 62B10, 60A10, 52A20. Contact. <link mail>E-mail,
Charles University in Prague, Department of Probability and Mathematical Statistics, Sokolovská 83, Prague, 186 75, Czech Republic, EU
- Ay N., Knauf, A. (2006). Maximizing multi-information. Kybernetika, 42:517--538, 2006.
- Matúš, F. (2004). Maximization of information divergences from binary i.i.d. sequences. Proceedings IPMU 2004, 2:1303--1306, Perugia, Italy.
- Matúš, F. (2007). Optimality conditions for maximizers of the information divergence from an exponential family. Kybernetika, 43:731--746.
- Matúš, F. (2009). Divergence from factorizable distributions and matroid representations by partitions. IEEE Transactions on Information Theory, 55(12):5375--5381.
- Rauh, J. (2009). Finding the maximizers of the information divergence from an exponential family. arXiv:0912.4660.
Standard quasi-Newton methods such as BFGS or DFP formula are closely related to the geometrical structure over multivariate normal distributions. In this presentation, first we introduce the relation between the update formula of the Hessian matrix and the Kullback-Leibler divergence over multivariate normal distributions. Then an extension of Hessian update is derived from the Bregman divergence, which is an extension of Kullback-Leibler divergence. Especially, we exploit the Bregman divergence with V-potentials in order to obtain the tractable update formula. Based on our framework, we study convergence property, group-invariance and robustness of the Hessian update formula.
We study the asymptotic performance of a certain class of semi-supervised learning methods in view of information geometry for estimating functions. Semi-supervised learning has attracted many researcher's interests. Even though many complicated methods were proposed, only few studies discussed their theoretical performance.
Some semi-supervised learning methods can be formulated with estimating functions. We analyze such types of methods in this poster. Amari and Kawanabe (1997) analyzed the information geometric properties of estimating functions. Using their framework, we derive the class of all estimating functions for our problem. We further derive the optimal estimating function. In general, however, the optimal estimating function is not available since it depends on a unknown quantity. We provide a way of constructing a good estimate, which is always available, for the optimal estimating function. In addition, we also mention that a specific class of semi-supervised approach is deeply related to a statistical paradox, which was geometrically analyzed by Henmi and Eguchi (2004).
The second or higher order efficient estimators for curved exponential families have been studied in the context of information geometry. However, computation of the estimator and evaluation of the risk have not been discussed well other than sampling approximations as Monte Carlo methods. In this presentation, a class of ``algebraic'' second order efficient estimators for ``algebraic'' curved exponential families is proposed. For this class of estimators, differential forms such as the Fisher metric, affine connections and embedding curvatures can be computed by algebraic computational methods such as Gröbner bases. The efficient estimators, their bias and their risk are evaluated via these differential forms. We demonstrate the effectiveness of algebraic method using some simple example models.
Using the non-commutative flow of weights on von Neumann algebras, we propose the definitions of the quantum information geometric notions of -divergence, riemannian metric and affine -connections on the spaces of finite positive normal functionals on von Neumann algebras. This way we generalise the formulation that was provided earlier by Jencova on the base of Araki-Masuda theory of non-commutative spaces. Using our formulation, we define the constrained maximum quantum relative -entropy updating rule and discuss some properties of quantum bayesian inference in this setting.
The ergodic sequences consisting of vectors , , over a finite alphabet are colored with colors for and colors for . Generic behavior of probabilities of monochromatic rectangles intersected with typical sets is examined. When n increases a big majority of pairs of colorings produces rectangles whose probabilities are bounded uniformly from above. Bounds are worked out in all regimes of the rates a and b of colorings. As a consequence, generic behavior of Shannon entropies of the partitions into rectangles is described.
Pseudo-Boolean functions are real-valued functions defined over a vector of binary variables. They appear in many different fields and are well studied in integer programming and in combinatorial optimization. The optimization of this class of functions is of particular interest, since it is NP-hard in the general formulation and no exact polynomial-time algorithm is available. Often in the literature pseudo-Boolean function optimization is referred as 0/1 programming.
We analyze the problem of pseudo-Boolean functions optimization by introducing the notion of stochastic relaxation, i.e., we look for the minima of a pseudo-Boolean function by minimizing its expected value over a set of probability densities. By doing this, we move from a discrete optimization problem to a continuous one, where the parameters of the statistical model become the new variables of the optimization problem. We choose statistical models that belong to the exponential family, and we justify this choice with results about the characterization of its topological closure and of its tangent space. Indeed, we are looking for minimizing sequences of densities in the model that converge towards distributions with reduced support concentrated on the minima of the pseudo-Boolean function. Such limit distributions do not belong to the exponential model, so it becomes important, given an exponential model, to determine which densities are included in its closure. Similarly, we are interested in the characterization of the tangent space of an exponential family, since in each point we are looking for the direction of maximum decrement of the expected value of the original function. Under a proper choice of the sufficient statistics of the exponential family used in the relaxation, the curve of maximum decrement is an exponential family itself. We provide some results about the existence of critical points of the relaxed function, in terms of the relation between the expansion of the pseudo-Boolean function and the sufficient statistics of the exponential model. The presence of stationary points which correspond to saddle points may determine the existence of different local minima, to which a minimizing sequence of densities may converge.
The analysis developed leads to the proposal of a new algorithm for pseudo-Boolean functions optimization based on stochastic gradient descent, for which we provide preliminary experimental results. The algorithm is in principle similar to some other techniques that have been proposed recently in the literature, often referred as population based algorithm, since at each iteration a pool of feasible candidate solutions is generated by sampling from a statistical model. Such algorithms are known in the Evolutionary Computation literature as Estimation of Distribution Algorithms (EDAs), and a similar approach appears also in stochastic optimization under the name of Cross-Entropy method. By taking inspiration from the EDA meta-heuristic and leveraging on the properties of the exponential model, we can design an algorithm that updates explicitly the model parameters in the direction of the gradient of the expected value of a pseudo-Boolean function, instead of estimating the value of the parameters of the model from a subset of the current sample of feasible solutions, as in most of the EDAs described in the literature. The gradient of the expected value of a function defined over the sample space, with respect to an exponential family, can be evaluated in terms of covariances, but since these evaluations require a summation over the entire search space, we propose to replace them with empirical covariances, to be estimated from the current sample. We implemented a vanilla version of the algorithm to find the ground states of some instances of a 2D spin glass model. The sufficient statistics of the exponential family have been determined according to the lattice structure of the spin glass model, such that all the monomials in the energy function correspond to a sufficient statistics of the model. We compared the performance of our algorithm with the state of the art algorithms in the Evolutionary Computation literature, to solve 2D Ising spin glass problems. We run multiple instances of the algorithms, for different sizes of the lattice, 8x8, 10x10, and 20x20, respectively.
Preliminary experimental results are encouraging and compare favourably with other recent heuristics proposed in the literature. Since we deal with a sample size which is much small than the cardinality of the sample space, the estimation of the covariances is affected by large noise.
For this reason it seems convenient to replace empirical covariance estimation with other techniques which proved to be able to provide more accurate estimation, such as shrinkage approach to large-scale covariance matrix estimation. Such methods offer robust estimation techniques with computational complexity which is often no more that twice that required for empirical covariance estimation.
Moreover, the algorithm can also be applied in the black box optimization context, by incorporating in the estimation procedure some model building techniques able to learn from the sample a set of statistically significant correlations among the variables in the original function. Since often in real world problems we deal with sparse problems, i.e., each variable interact with a restricted number of variables, l1-regularized logistic regression methods for high-dimensional model selection techniques seem to provide valuable tools in this context. The algorithm we proposed is highly parallelizable, both in the estimation of covariances and in the sampling step. The final aim is to develop an efficient and effective approach to adaptively solve very large pseudo-Boolean problems also in the black-box context for which the interaction structure among the variable is unknown.
The aim of the talk is to characterize monotone `metric' in the space of markov map. Here, `metric' means the square of the norm defined on the tangent space, and not necessarily equals the inner product of the vector with itself, different from usual notion of metric used in differential geometry. (Hereafter, this property, that the norm is induced from an inner innerproduct, is called inner-product-assumption.) So far, there have been plenty of literatures on the metric in the space of probability distributions and quantum states. Cencov, sometime in 1970s, proved the monotone metric in probability distribution space is unique up to constant multiple, and identical to Fisher information metric Cencov. Amari and others independently worked on the same object, especially from differential geometrical view points, and applied to number of problems in information sciences. Quantum mechanical states are discussed by Nagaoka, Fujiwara, Matsumoto and Petz. Among them, Petz characterized all the monotone metrics in the quantum state space using operator mean theory.
As for channels, however, only a little is known about its geometrical structures. To my knowledge, there had been no study about axiomatic characterization of distance measures in the classical or quantum channel space.
First, we show the upper and the lower bound of monotone channel "metric", and it is proved that any monotone "metric" cannot satisfy the inner-product-assumption. We give counter examples in the space of binary channels. The proof utilizes "local" version of Blackwell's randomization criteria for equivalence of statistical models, which is well known in statistical decision theory.
The latter result has some impact on the axiomization of the monotone metric in the space of classical and quantum states, since both Cencov and Petz rely on the inner-product-assumption. Since classical and quantum states can be viewed as channels with the constant output, it is preferable to dispense with the inner-product-assumption. Recalling that the Fisher information is useful in asymptotic theory, it would be natural to introduce some assumptions on asymptotic behaviour. Hence, we introduced weak asymptotic additivity and lower asymptotic continuity. By these additional assumptions, we not only recovers uniqueness result of Cencov, but also proves uniqueness of the monotone `metric' in the channel space. It is known that this unique "metric" gives maximum information in estimating unknown channels. In this proof, again, we used the local and asymptotic version of randomization criteria.
In the end, there is an implication on quantum state metrics. A quantum state can be viewed as a classical channel which takes a measurement as an input, and outputs measurement a result. If we restrict the measurement to separable measurement, the asymptotic theory discussed in our paper can be applied to quantum states also, proving the uniqueness of the metric. On the other hand, the author's past manuscript had reestablished the upper and the lower bound of the monotone metric by Petz, without relying on the inner-product-assumption. This suggests the monotone `metric' in the quantum state space is not unique. Therefore, having collective measurement is essential to have a variety of monotone metrics.
This work is about subsets of a state-space with the following property (S): All probability distributions supported therein are elements of the closure of a given exponential family. There exists an optimal cardinality condition for sets of the state-space which ensures they have the property S. However, this is not a necessary condition, and the sets can be considerably larger. We present a characterization of S and use it to compute lower bounds on the maximal cardinality of sets with S. Furthermore we show that there are actions on the state-space which preserve the property S. These results are applied to find bounds on the minimal number of elements belonging to a certain exponential family forming a mixture representation of a probability distribution which belongs to another (larger) exponential family.
An important open question in mathematical neuroscience is how to evaluate the significance of high order spike correlations in the neural code through analytically solvable models. We investigate the thermodynamic limit of a widespread probability distribution of firing in a neuronal pool, within the information geometry framework, considering all possible contributions from high order correlations. This allows us to identify a deformation parameter accounting for the different regimes of firing within the probability distribution, and to investigate whether those regimes could saturate or increase information as the number of neurons goes to infinity.
S. Amari, H. Nakahara, S. Wu and Y. Sakai, Neural Comput. 15, 127 (2003)
B.B. Averbeck, P.E. Latham and A. Pouget, Nat. Rev. Neurosci. 7, 358 (2006)
F. Montani, A. Kohn, A. Smith and S.R. Schultz, J. Neurosci. 27, 2338 (2007)
F. Montani, R.A.A. Ince, R. Senatore, E. Arabzadeh, M.E. Diamond and S. Panzeri, Phil. Trans. R. Soc. 367, 3297 (2009)
My poster presents results from a joint work with Nihat Ay and Thomas Kahle (preprint available at arXiv:0906.5462). We study how results from algebraic statistics generalize to the case of non-algebraic exponential families on finite sets. Here an exponential family is called algebraic if it has an integer-valued matrix of sufficient statistics. In this case the exponential family is the intersection of an algebraic variety with the probability simplex, which makes available the powerful tools of computational commutative algebra. While most relevant examples of exponential families are algebraic, it turns out that ignoring the algebraic properties yields another viewpoint which better captures the continuity aspects of exponential families.
A lot of properties can be deduced from an oriented matroid naturally associated to the exponential family: The closure of a discrete exponential family is described by a finite set of equations corresponding to the circuits of this matroid. These equations are similar to the equations used in algebraic statistics, although they need not be polynomial in the general case. This description allows for a combinatorial study of the possible support sets in the closure of an exponential family. In particular, if two exponential families induce the same oriented matroid, then their closures have the same support sets.
Finally we find a surjective (but not injective) parametrization of the closure of the exponential family by adding more parameters. These parameters also have an interpretation in the matroid picture. The parametrization generalizes the monomial parametrization used in algebraic statistics in the case of algebraic exponential families.
The renormalization group procedure to calculate a partition function in statistical mechanics, is considered as the successive approximation to the canonical distribution on its state space. By embedding renormalized states into the original state space, this approximation is characterized as the m-projection from the canonical distribution onto a renormalization submanifold. We apply this method for a diamond fractal Ising spin model, whose renormalization flow has fixed points in the finite region.
Multi-class classification problem is one of the major topic in the fields of machine learning. There are many works on the topic, and one major approach considers a decomposition of the multi-class problem into multiple binary classification problems based on the framework of error correcting output coding (ECOC). Each decomposed binary problem is independently solved and results of binary classifiers are integrated (decoded) for a prediction of multi-class label.
In this research, we present a new integration method of binary classifiers for multi-class problem. Our integration method (decoder) is characterized by a minimization of sum of divergences, in which each divergence measures diversity between the decoder and a posterior distribution of the class label associated with a binary classifier.
We investigate performance of the proposed method using a synthetic dataset, datasets from the UCI repository.
S-formalism  in the generalized thermostatistics based on Tsallis entropy S  is a natural formalism in the sense that the associated Legendre structures are derived in a similar way as in the standard thermostatistics. From a q-exponential probability distribution function (pdf), which maximizes S under the constraint of linear average energy U, the so-called escort pdf is naturally appeared in this formalism. The generalized Massieu potential associated with S and U is related to the one associated with the normalized Tsallis entropy S and the normalized q-average energy U, which is the energy-average w.r.t. the escort pdf. The S formalism has also provided the connections among some different versions of Tsallis nonextensive thermostatistics, a non self-referential expression of Tsallis’ pdf , and the relation between the Boltzmann temperature and the Lagrange multiplier in nonextensive thermostatistics .
On the other hand, it is shown recently in Ref.  that a dually flat structure on the space of the escort probabilities is obtained by applying 1-conformal transformation to the -geometry, which is an information geometry with a constant curvature and related with Tsallis relative entropy .
We explore the relation between the information geometrical structures associated with this dually flat structure and the Legendre structures in the S formalism. We show that the Legendre dual potential functions in the information geometry with this dually flat structure are the generalized Massieu potential and . We further study the correspondences among the potential functions, dual affine coordinates, and relevant divergence functions between the information geometry of the dually flat structure and S-formalism.
 T. Wada, A.M. Scarfone, Connections between Tsallis' formalisms employing the standard linear average energy and ones employing the normalized q-average energy, Phys. Lett. A 335 (2005) 351-362.
 C. Tsallis, Introduction to Nonextensive Statistical Mechanics - Approaching a Complex World (Springer,. New York, 2009).
 T. Wada, A.M. Scarfone, A non self-referential expression of Tsallis' probability distribution function, Eur. Phys. J. B 47 (2005) 557-561.
 T. Wada, A.M. Scarfone, The Boltzmann Temperature and Lagrange Multiplier in Nonextensive Thermostatistics, Prog. Theor. Phys. Suppl. 162 (2006) 37-44.
 A. Ohara, H. Matsuzoe, S-I. Amari, A dually flat structure on the space of escort distributions, J. Phys.: Conf. Series 201 (2010) 012012.
 A. Ohara, Geometric study for the Legendre duality of generalized entropies and its application to the porous medium equation, Eur. Phys. J. B 70, (2009) 15-28.
The most serious obstacle against realizing quantum computers and networks is decoherence that acts as a noise and causes information loss. Decoherence occurs when a quantum system interacts with its environment, and it is unavoidable in almost all quantum systems. Therefore, one of the central problems in quantum information science concerns the optimal measurement to retrieve information about the original quantum state from the decohered one and the maximum information that can be obtained from the measurement.
We identify an optimal quantum measurement that retrieves the maximum Fisher information about the expectation value of an observable from the partially decohered state. And we also clarify the maximum Fisher information obtained by the optimal measurement.
 Y. Watanabe, T. Sagawa, and M. Ueda, Phys. Rev. Lett. 104, 020401 (2010).
In order to characterize statistical data lying on a Riemannian manifold, one often uses the barycenter of empirical data as the notion of centrality. But it is known to all that barycenter is not a robust estimator and is sensitive to outliers. An ideal substitute of the barycenter possessing robustness is the notion of geometric median. In this paper, we define geometric median for a probability measure on a Riemannian manifold, give its characterization and a natural condition to ensure its uniqueness. In order to compute geometric median in practical cases, we also propose a subgradient algorithm and prove its convergence as well as estimating the error of approximation and the rate of convergence. The convergence property of this subgradient algorithm, which is a generalization of the classical Weiszfeld algorithm in Euclidean spaces to the context of Riemannian manifolds, does not depend on the sign of curvatures of the manifold and hence improves a recent result of Fletcher and his colleagues.