Given a divergence function in a manifold, we can derive a Riemannian metric and a dual pair of affine connections, which are the essential constituents of Information Geometry. In the case of a family of probability distributions, its information-geometric structure is given from the invariance property. It consists of the Fisher information metric and the alpha connections. Moreover, the manifold of discrete probability distributions, that is the set of all probability distribution on a finite set, has a dually flat Riemannian structure. We study how these properties are related to the underlying divergence function. We define an invariant divergence in terms of information monotonicity, which leads us to the class of $f$-divergences. We then study a divergence function which gives dually flat affine structure. This is given by the Bregman divergence in terms of a convex function. The invariant and flat divergence in the manifold of probability distributions is the Kullback-Leibler divergence, and this is unique, but more generally it is the class of alpha-divergences in the manifold of positive measures. We can further discuss divergence functions in the manifold of positive-definite matrices, that of vision pictures and cones. A nonlinear transformation of a divergence function or a convex function causes a conformal change of the dual geometrical structure. In this context, we can discuss the dual geometry derived from the Tsallis or Renyi entropy. It again gives the dually flat structure to the family of discrete probability distributions or of positive measures. This can be extended to the family of positive-definite matrices. We can define the $q$-exponential family and related $q$-structure, which is a generalization of the current invariant information geometry.
The nonadditive entropy $S_q$ and its associated statistical mechanics pose various mathematically interesting challenges. Some selected ones will be presented. In addition to this, some recent predictions, verifications and applications in natural and artificial systems will be briefly described as well.BIBLIOGRAPHY:(i) C. Tsallis, Entropy, in Encyclopedia of Complexity and Systems Science (Springer, Berlin, 2009);(ii) C. Tsallis, Introduction to Nonextensive Statistical Mechanics - Approaching a Complex World (Springer, New York, 2009); (iii) S. Umarov, C. Tsallis, M. Gell-Mann and S. Steinberg, J. Math. Phys. 51, 033502 (2010);(iv) M. Jauregui and C. Tsallis, J. Math. Phys. 51 (June 2010), in press;(v) CMS Collaboration, J. High Energy Phys. 02, 041 (2010);(vi) http://tsallis.cat.cbpf.br/biblio.htm
A scoring rule $S(x, Q)$ measures the quality of a quoted distribution $Q$ for an uncertain quantity $X$ in the light of the realised value $x$ of $X$. It is proper when it encourages honesty, i.e, when, if your uncertainty about $X$ is represented by a distribution $P$, the choice $Q = P$ minimises your expected loss. Traditionally, a scoring rule has been called local if it depends on $Q$ only through $q(x)$, the density of $Q$ at $x$. The only proper local scoring rule is then the $\log$-score, $-\log q(x)$. For the continuous case, we can weaken the definition of locality to allow dependence on a finite number m of derivatives of $q$ at $x$. A full characterisation is given of such order-$m$ local proper scoring rules, and their behaviour under transformations of the outcome space. In particular, any $m$-local scoring rule with $m > 0$ can be computed without knowledge of the normalising constant of the density. Parallel results for discrete spaces will be given.
Boltzmann-Shannon entropy leads to an exponential model as the maximum entropy model with the constraint to the space of pdfs under which expectations of a given statistic $t(x)$ become a common vector. The maximum likelihood estimator for the expectation parameter of $t(x)$ under the exponential model is characterized by specific properties such as the attainment the Cramer-Rao bound. Any generator function $U$ defines $U$-entropy and $U$-divergence from the assumption of convexity of $U$. In this framework, $U$-entropy leads to $U$-model as the maximum entropy model under which the minimum $U$-divergence estimator for the expectation parameter is characterized by a structure of orthogonal foliation. If $U(s) = \exp(s)$, then this reduces to the case of Boltzmann-Shannon entropy. Surprisingly, we observe that the minimum $U$-divergence estimator under the $U$-model has a unique form, that is, the sample mean of $t(x)$. Alternatively if the minimum $U$-divergence estimator is employed under another $U$-model, then the estimator has a different form with the weighted mean of $t(x)$ over the sample. This talk discusses information geometric understandings for this aspect with Pythagoras identity, minimax game and robustness.
The characterisation of interest-rate dynamics requires the understanding of the interest-rate term structure (dependencies of the bond price on different maturities). As such, interest-rate modelling amounts to the modelling of the random dynamics of a smooth curve, known as the "yield curve". A priori there is no structure on the space of yield curves to allow for an elegant mathematical characterisation, but it turns out that there is a remarkable correspondence between an yield curve and a probability density function. The dynamics of interest-rate term structure can then be represented as a measure-valued process, and the latter allows for the powerful method of information geometry to identify useful quantities like the separation of two interest rate markets. One can, furthermore, ask where does the random dynamics of interest rates emerge from? This is given by the flow of information in financial markets concerning various economic factors such as the liquidity risk. The dynamics of the term-structure density function can then be seen as the result of an optimum filter, calculated by the market. We can therefore model the flow of information explicitly and derive interest rate dynamics, seen as an emergent phenomena.[1] Brody, D.C. & Hughston, L.P. (2001) Interest rates and information geometry. Proceedings of the Royal Society London A457, 1343-1364.[2] Brody, D.C. & Friedman, R.L. (2009) Information of interest. Risk Magazine, December, 105-110 (reprinted in: Life & Pensions, February 2010, 35-40).
The Wigner-Yanase-Dyson information measures are prominent examples of the “metric adjusted skew information” constructed from monotone metrics. These measures of quantum information are related to special classes of operator monotone functions and their properties. Our growing knowledge of operator monotone functions has thus become a powerful tool in the study of measures of quantum information. For example, it has given a natural order relation rendering the set of quantum information measures into a lattice.
We give an overview of recent developments in this area with a special regard to the superadditivity conjecture of Wigner and Yanase which, although generally false, still holds true for important classes of states.
In the early 30's, Schrödinger addressed and solved formally a statistical physics problem which is amazingly analogous to quantum mechanics. It is a large deviation problem which is similar to the Monge-Kantorovich optimal transport problem. This similarity is not incidental. Indeed, it will be shown that the optimal transport problem is the limit of a sequence of well-chosen Schrödinger problems. Analytically, this amounts to describe the optimal transport problem as a Gamma-limit of relative entropy minimization problems under prescribed marginal constraints. The minimizers of these problems might be interpreted as some kind of geodesics in the space of probability measures.
George Galanis Hellenic Naval Academy, GreeceApplications of Information Geometry to environmental numerical prediction systems
A new area where the application of Information Geometric techniques could contribute beneficially is discussed in this work. It concerns the optimization of environmental prediction systems whose added value, in today’s competitive scientific and operational environment, is constantly increasing.The main approach towards accurate environmental forecasting products is today the use of numerical atmospheric and wave prediction systems. Such platforms simulate successfully the general environmental conditions on global or intermediate scale. However, systematic errors emerge when focusing on local characteristics, mainly as a result of the inability to capture sub‐scale phenomena.In this framework the use of assimilation systems – aiming at the improvement of initial conditions – and statistical post processes ‐ for the local adaptation of the forecasts ‐ provide significant support. However, in the majority of such techniques, the available data are recognized as elements of Euclidean spaces and least square methods are employed for the estimation‐optimization of their distances.In the present work the use of Information Geometry in this framework is discussed trying to estimate:The statistical manifold to which the environmental data better fit.The way that the corresponding geodesics could be estimated.In particular, the results of a state‐of‐the art wave prediction model (WAM) have been utilized in an area of significant importance: The North Atlantic Ocean.The obtained outcomes, apart from the clear view that they provide for the wave climate and for the performance of the numerical model, could be exploited for designing new optimization techniques for wave prediction systems that cannot be supported by classical statistics.
Marc Harper UCLA, USAApplications of information geometry in evolutionary dynamics
Many important constructions and definitions in Evolutionary Game Theory can be elegantly interpreted in terms of concepts from information geometry. Information divergences produce new families of replicator-like dynamics, yielding global Lyapunov functions in conjunction with evolutionary stability and analogs of important local results, such as Fisher's fundamental theorem. Using the generalized information divergences recently developed in Statistical Thermodynamics, we can formulate replicator dynamics with type-dependent intensities of selection and many other variations, including the orthogonal projection dynamic. If space permits, I will discuss connections to Bayesian inference and the concept of potential information.Some of this work is discussed on my blog (www.marcallenharper.com/blog/) and in my papers on the ArXiv. My collaborator Chris Lee at UCLA and I hope to publish results relevant to inference in the near future. I am also applying this work in connection with the evolutionary ecology group at UCLA, using Fisher information dynamics to investigate population dynamics. I think this will yield an interesting poster concerning applications of information geometry in evolutionary dynamics and inference.
Masayuki Henmi The Institute of Statistical Mathematics, JapanA dual differential geometrical structure induced from estimating functions
In a parametric statistical inference, the maximum likelihood method is typically used for parameter estimation, and it is nicely explained by the dual differential geometrical structure of a statistical model with Fisher metric and e, m-connections. A good property of this structure is that both of these affine connections are torsion-free, which leads to the important notion of dually flat space. However, this property does not necessarily hold, once we move to another estimation method and consider its associated geometry. In this poster presentation, we investigate a dual differential geometrical structure induced from general estimating functions by utilizing the theory of statistical manifolds admitting torsion, which has been recently developed by Kurose and Matsuzoe. In particular, we focus on the quasi-score (or quasi-likelihood) method and consider the role of the induced geometrical structure in the statistical inference.
Hiroshi Imai Dipartimento di Fisica A. Volta, ItalyA sufficient condition that an operator-sum representation gives horizontal lifting
Finite-dimensional quantum channel is a map defined on positive matrices which describes the quantum process from input quantum state (probability density matrix) to output state. It is a fundamental task to identify quantum channel statistically. For this purpose, quantum SLD Fisher information of a channel is important.This notion is closely related with a fiber structure constructed by operator-sum representations of a given channel and its horizontal lifting.We provide a sufficient condition that a derivative of an operator-sum representation becomes horizontal. As an example, estimation of unknown correlation between several unitary processes satisfying some condition is given.This is a joint work with Chiara Macchiavello and is supported by CORNER project: http://qurope.eu/projects/corner
Dominik Janzing MPI for biological cybernetics, Tuebingen, Germany (joint work with Povilas Daniusis, Joris Moiij, Bastian Steudel, Jakob Zscheischler, Kun Zhang, Bernhard Schölkopf)Inferring causal directions via information geometry
Some recent methods in causal inference use the assumption that the marginal distribution of the cause and the conditional distribution of the effect, given the cause are independent objects ob nature. In other words, the causal hypothesis ``X causes Y'' needs to be rejected if P(X) and P(Y|X) satisfy a relation that ``generic'' pairs (P(X), P(Y|X)) would not satisfy. To formalize such a vague statement is challenging, but we propose a notion of independence that can be interpreted as orthogonality in information space in the sense of information geometry. We developed a method for inferring causal directions based on such an orthogonality assumption and obtained encouraging empirical results.
Jozef Juricek Charles University in Prague, Czech RepublicMaximization of information divergence from the symmetrical exponential families
The problem of maximization of the information divergence from the exponential families symmetrical w.r.t. some permutation group is presented. This work is related to previous results of N. Ay in [1], F. Matúš in [2,3,4] and J. Rauh in [5]. Situations, in which there exist maximizers exchangeable w.r.t. the permutation group, are studied. Then, the dimensionality of the optimization problem is highly reduced. For some special cases explicit solutions are found. Maximization of information divergence from an exponential family has emerged in probabilistic models for evolution and learning in neural networks that are based on infomax principles. The maximizers admit interpretation as stochastic systems with high complexity w.r.t. exponential family [1].A link between divergence maximization and secret sharing was established in [4].Keywords. Kullback-Leibler divergence, relative entropy, exponential family, hierarchical models, information projection, symmetric group, permutation group. AMS 2000 Math. Subject Classification. Primary 94A17. Secondary 62B10, 60A10, 52A20. Contact. E-mail, Charles University in Prague, Department of Probability and Mathematical Statistics, Sokolovská 83, Prague, 186 75, Czech Republic, EU Ay N., Knauf, A. (2006). Maximizing multi-information. Kybernetika, 42:517--538, 2006. Matúš, F. (2004). Maximization of information divergences from binary i.i.d. sequences. Proceedings IPMU 2004, 2:1303--1306, Perugia, Italy. Matúš, F. (2007). Optimality conditions for maximizers of the information divergence from an exponential family. Kybernetika, 43:731--746. Matúš, F. (2009). Divergence from factorizable distributions and matroid representations by partitions. IEEE Transactions on Information Theory, 55(12):5375--5381. Rauh, J. (2009). Finding the maximizers of the information divergence from an exponential family. arXiv:0912.4660.
Takafumi Kanamori Nagoya University, Japan (joint work with Atsumi Ohar)A Bregman extension of quasi-Newton updates
Standard quasi-Newton methods such as BFGS or DFP formula are closely related to the geometrical structure over multivariate normal distributions. In this presentation, first we introduce the relation between the update formula of the Hessian matrix and the Kullback-Leibler divergence over multivariate normal distributions. Then an extension of Hessian update is derived from the Bregman divergence, which is an extension of Kullback-Leibler divergence. Especially, we exploit the Bregman divergence with V-potentials in order to obtain the tractable update formula. Based on our framework, we study convergence property, group-invariance and robustness of the Hessian update formula.
Masanori Kawakita Kyushu university, Japan (joint work with Jun'ichi Takeuchi)Semi-supervised learning in view of geometry of estimating function
We study the asymptotic performance of a certain class of semi-supervised learning methods in view of information geometry for estimating functions. Semi-supervised learning has attracted many researcher's interests. Even though many complicated methods were proposed, only few studies discussed their theoretical performance.Some semi-supervised learning methods can be formulated with estimating functions. We analyze such types of methods in this poster. Amari and Kawanabe (1997) analyzed the information geometric properties of estimating functions. Using their framework, we derive the class of all estimating functions for our problem. We further derive the optimal estimating function. In general, however, the optimal estimating function is not available since it depends on a unknown quantity. We provide a way of constructing a good estimate, which is always available, for the optimal estimating function. In addition, we also mention that a specific class of semi-supervised approach is deeply related to a statistical paradox, which was geometrically analyzed by Henmi and Eguchi (2004).
Kei Kobayashi The Institute of Statistical Mathematics, JapanUsing algebraic method in information geometry
The second or higher order efficient estimators for curved exponential families have been studied in the context of information geometry. However, computation of the estimator and evaluation of the risk have not been discussed well other than sampling approximations as Monte Carlo methods. In this presentation, a class of ``algebraic'' second order efficient estimators for ``algebraic'' curved exponential families is proposed. For this class of estimators, differential forms such as the Fisher metric, affine connections and embedding curvatures can be computed by algebraic computational methods such as Gröbner bases. The efficient estimators, their bias and their risk are evaluated via these differential forms. We demonstrate the effectiveness of algebraic method using some simple example models.
Ryszard Kostecki University of Warsaw, PolandQuantum information geometry and non-commutative flow of weights
Using the non-commutative flow of weights on von Neumann algebras, we propose the definitions of the quantum information geometric notions of -divergence, riemannian metric and affine -connections on the spaces of finite positive normal functionals on von Neumann algebras. This way we generalise the formulation that was provided earlier by Jencova on the base of Araki-Masuda theory of non-commutative spaces. Using our formulation, we define the constrained maximum quantum relative -entropy updating rule and discuss some properties of quantum bayesian inference in this setting.
Michal Kupsa Academy of Sciences of the Czech Republic, Czech Republic (joint work with František Matúš)On colorings of bivariate random sequences
The ergodic sequences consisting of vectors , , over a finite alphabet are colored with colors for and colors for . Generic behavior of probabilities of monochromatic rectangles intersected with typical sets is examined. When n increases a big majority of pairs of colorings produces rectangles whose probabilities are bounded uniformly from above. Bounds are worked out in all regimes of the rates a and b of colorings. As a consequence, generic behavior of Shannon entropies of the partitions into rectangles is described.
Luigi Malagò Politecnico di Milano, Italy (joint work with Matteo Matteucci, Giovanni Pistone)Optimization of pseudo-Boolean functions based on the exponential family relaxation
Pseudo-Boolean functions are real-valued functions defined over a vector of binary variables. They appear in many different fields and are well studied in integer programming and in combinatorial optimization. The optimization of this class of functions is of particular interest, since it is NP-hard in the general formulation and no exact polynomial-time algorithm is available. Often in the literature pseudo-Boolean function optimization is referred as 0/1 programming.We analyze the problem of pseudo-Boolean functions optimization by introducing the notion of stochastic relaxation, i.e., we look for the minima of a pseudo-Boolean function by minimizing its expected value over a set of probability densities. By doing this, we move from a discrete optimization problem to a continuous one, where the parameters of the statistical model become the new variables of the optimization problem. We choose statistical models that belong to the exponential family, and we justify this choice with results about the characterization of its topological closure and of its tangent space. Indeed, we are looking for minimizing sequences of densities in the model that converge towards distributions with reduced support concentrated on the minima of the pseudo-Boolean function. Such limit distributions do not belong to the exponential model, so it becomes important, given an exponential model, to determine which densities are included in its closure. Similarly, we are interested in the characterization of the tangent space of an exponential family, since in each point we are looking for the direction of maximum decrement of the expected value of the original function. Under a proper choice of the sufficient statistics of the exponential family used in the relaxation, the curve of maximum decrement is an exponential family itself. We provide some results about the existence of critical points of the relaxed function, in terms of the relation between the expansion of the pseudo-Boolean function and the sufficient statistics of the exponential model. The presence of stationary points which correspond to saddle points may determine the existence of different local minima, to which a minimizing sequence of densities may converge.The analysis developed leads to the proposal of a new algorithm for pseudo-Boolean functions optimization based on stochastic gradient descent, for which we provide preliminary experimental results. The algorithm is in principle similar to some other techniques that have been proposed recently in the literature, often referred as population based algorithm, since at each iteration a pool of feasible candidate solutions is generated by sampling from a statistical model. Such algorithms are known in the Evolutionary Computation literature as Estimation of Distribution Algorithms (EDAs), and a similar approach appears also in stochastic optimization under the name of Cross-Entropy method. By taking inspiration from the EDA meta-heuristic and leveraging on the properties of the exponential model, we can design an algorithm that updates explicitly the model parameters in the direction of the gradient of the expected value of a pseudo-Boolean function, instead of estimating the value of the parameters of the model from a subset of the current sample of feasible solutions, as in most of the EDAs described in the literature. The gradient of the expected value of a function defined over the sample space, with respect to an exponential family, can be evaluated in terms of covariances, but since these evaluations require a summation over the entire search space, we propose to replace them with empirical covariances, to be estimated from the current sample. We implemented a vanilla version of the algorithm to find the ground states of some instances of a 2D spin glass model. The sufficient statistics of the exponential family have been determined according to the lattice structure of the spin glass model, such that all the monomials in the energy function correspond to a sufficient statistics of the model. We compared the performance of our algorithm with the state of the art algorithms in the Evolutionary Computation literature, to solve 2D Ising spin glass problems. We run multiple instances of the algorithms, for different sizes of the lattice, 8x8, 10x10, and 20x20, respectively.Preliminary experimental results are encouraging and compare favourably with other recent heuristics proposed in the literature. Since we deal with a sample size which is much small than the cardinality of the sample space, the estimation of the covariances is affected by large noise.For this reason it seems convenient to replace empirical covariance estimation with other techniques which proved to be able to provide more accurate estimation, such as shrinkage approach to large-scale covariance matrix estimation. Such methods offer robust estimation techniques with computational complexity which is often no more that twice that required for empirical covariance estimation.Moreover, the algorithm can also be applied in the black box optimization context, by incorporating in the estimation procedure some model building techniques able to learn from the sample a set of statistically significant correlations among the variables in the original function. Since often in real world problems we deal with sparse problems, i.e., each variable interact with a restricted number of variables, l1-regularized logistic regression methods for high-dimensional model selection techniques seem to provide valuable tools in this context. The algorithm we proposed is highly parallelizable, both in the estimation of covariances and in the sampling step. The final aim is to develop an efficient and effective approach to adaptively solve very large pseudo-Boolean problems also in the black-box context for which the interaction structure among the variable is unknown.
Keiji Matsumoto National Institute of Informatics, JapanMonotone "metric" in the channel space: decision theoretic approach
The aim of the talk is to characterize monotone `metric' in the space of markov map. Here, `metric' means the square of the norm defined on the tangent space, and not necessarily equals the inner product of the vector with itself, different from usual notion of metric used in differential geometry. (Hereafter, this property, that the norm is induced from an inner innerproduct, is called inner-product-assumption.) So far, there have been plenty of literatures on the metric in the space of probability distributions and quantum states. Cencov, sometime in 1970s, proved the monotone metric in probability distribution space is unique up to constant multiple, and identical to Fisher information metric Cencov. Amari and others independently worked on the same object, especially from differential geometrical view points, and applied to number of problems in information sciences. Quantum mechanical states are discussed by Nagaoka, Fujiwara, Matsumoto and Petz. Among them, Petz characterized all the monotone metrics in the quantum state space using operator mean theory.As for channels, however, only a little is known about its geometrical structures. To my knowledge, there had been no study about axiomatic characterization of distance measures in the classical or quantum channel space. First, we show the upper and the lower bound of monotone channel "metric", and it is proved that any monotone "metric" cannot satisfy the inner-product-assumption. We give counter examples in the space of binary channels. The proof utilizes "local" version of Blackwell's randomization criteria for equivalence of statistical models, which is well known in statistical decision theory.The latter result has some impact on the axiomization of the monotone metric in the space of classical and quantum states, since both Cencov and Petz rely on the inner-product-assumption. Since classical and quantum states can be viewed as channels with the constant output, it is preferable to dispense with the inner-product-assumption. Recalling that the Fisher information is useful in asymptotic theory, it would be natural to introduce some assumptions on asymptotic behaviour. Hence, we introduced weak asymptotic additivity and lower asymptotic continuity. By these additional assumptions, we not only recovers uniqueness result of Cencov, but also proves uniqueness of the monotone `metric' in the channel space. It is known that this unique "metric" gives maximum information in estimating unknown channels. In this proof, again, we used the local and asymptotic version of randomization criteria. In the end, there is an implication on quantum state metrics. A quantum state can be viewed as a classical channel which takes a measurement as an input, and outputs measurement a result. If we restrict the measurement to separable measurement, the asymptotic theory discussed in our paper can be applied to quantum states also, proving the uniqueness of the metric. On the other hand, the author's past manuscript had reestablished the upper and the lower bound of the monotone metric by Petz, without relying on the inner-product-assumption. This suggests the monotone `metric' in the quantum state space is not unique. Therefore, having collective measurement is essential to have a variety of monotone metrics.
Guido Montùfar Max Planck Institute for Mathematics in the Sciences, GermanyFaces of the probability simplex contained in the closure of an exponential family and minimal mixture representations
This work is about subsets of a state-space with the following property (S): All probability distributions supported therein are elements of the closure of a given exponential family. There exists an optimal cardinality condition for sets of the state-space which ensures they have the property S. However, this is not a necessary condition, and the sets can be considerably larger. We present a characterization of S and use it to compute lower bounds on the maximal cardinality of sets with S. Furthermore we show that there are actions on the state-space which preserve the property S. These results are applied to find bounds on the minimal number of elements belonging to a certain exponential family forming a mixture representation of a probability distribution which belongs to another (larger) exponential family.
Mariela Portesi CONICET & Universidad Nacional de La Plata, Argentina (joint work with Fernando Montani)Statistical modeling of neuronal activity for an infinitely large number of neurons
An important open question in mathematical neuroscience is how to evaluate the significance of high order spike correlations in the neural code through analytically solvable models. We investigate the thermodynamic limit of a widespread probability distribution of firing in a neuronal pool, within the information geometry framework, considering all possible contributions from high order correlations. This allows us to identify a deformation parameter accounting for the different regimes of firing within the probability distribution, and to investigate whether those regimes could saturate or increase information as the number of neurons goes to infinity.S. Amari, H. Nakahara, S. Wu and Y. Sakai, Neural Comput. 15, 127 (2003)B.B. Averbeck, P.E. Latham and A. Pouget, Nat. Rev. Neurosci. 7, 358 (2006)F. Montani, A. Kohn, A. Smith and S.R. Schultz, J. Neurosci. 27, 2338 (2007)F. Montani, R.A.A. Ince, R. Senatore, E. Arabzadeh, M.E. Diamond and S. Panzeri, Phil. Trans. R. Soc. 367, 3297 (2009)
Johannes Rauh Max Planck Institute for Mathematics in the Sciences, GermanySupport sets in exponential families and oriented matroid theory
My poster presents results from a joint work with Nihat Ay and Thomas Kahle (preprint available at arXiv:0906.5462). We study how results from algebraic statistics generalize to the case of non-algebraic exponential families on finite sets. Here an exponential family is called algebraic if it has an integer-valued matrix of sufficient statistics. In this case the exponential family is the intersection of an algebraic variety with the probability simplex, which makes available the powerful tools of computational commutative algebra. While most relevant examples of exponential families are algebraic, it turns out that ignoring the algebraic properties yields another viewpoint which better captures the continuity aspects of exponential families.A lot of properties can be deduced from an oriented matroid naturally associated to the exponential family: The closure of a discrete exponential family is described by a finite set of equations corresponding to the circuits of this matroid. These equations are similar to the equations used in algebraic statistics, although they need not be polynomial in the general case. This description allows for a combinatorial study of the possible support sets in the closure of an exponential family. In particular, if two exponential families induce the same oriented matroid, then their closures have the same support sets.Finally we find a surjective (but not injective) parametrization of the closure of the exponential family by adding more parameters. These parameters also have an interpretation in the matroid picture. The parametrization generalizes the monomial parametrization used in algebraic statistics in the case of algebraic exponential families.
Shigeru Shuto Osaka University, JapanInformation geometry of renormalization on diamond fractal Ising spins
The renormalization group procedure to calculate a partition function in statistical mechanics, is considered as the successive approximation to the canonical distribution on its state space. By embedding renormalized states into the original state space, this approximation is characterized as the m-projection from the canonical distribution onto a renormalization submanifold. We apply this method for a diamond fractal Ising spin model, whose renormalization flow has fixed points in the finite region.
Takashi Takenouchi Nara Institute of Science and Technology, JapanBayesian decoder for multi-class classification by mixture of divergence
Multi-class classification problem is one of the major topic in the fields of machine learning. There are many works on the topic, and one major approach considers a decomposition of the multi-class problem into multiple binary classification problems based on the framework of error correcting output coding (ECOC). Each decomposed binary problem is independently solved and results of binary classifiers are integrated (decoded) for a prediction of multi-class label.In this research, we present a new integration method of binary classifiers for multi-class problem. Our integration method (decoder) is characterized by a minimization of sum of divergences, in which each divergence measures diversity between the decoder and a posterior distribution of the class label associated with a binary classifier.We investigate performance of the proposed method using a synthetic dataset, datasets from the UCI repository.
Tatsuaki Wada Ibaraki University, Japan (joint work with Atsumi Ohara)Legendre duality and dually-flat structure in nonextensive thermostatistics developed by S2-q formalism
S-formalism [1] in the generalized thermostatistics based on Tsallis entropy S [2] is a natural formalism in the sense that the associated Legendre structures are derived in a similar way as in the standard thermostatistics. From a q-exponential probability distribution function (pdf), which maximizes S under the constraint of linear average energy U, the so-called escort pdf is naturally appeared in this formalism. The generalized Massieu potential associated with S and U is related to the one associated with the normalized Tsallis entropy S and the normalized q-average energy U, which is the energy-average w.r.t. the escort pdf. The S formalism has also provided the connections among some different versions of Tsallis nonextensive thermostatistics, a non self-referential expression of Tsallis’ pdf [3], and the relation between the Boltzmann temperature and the Lagrange multiplier in nonextensive thermostatistics [4].On the other hand, it is shown recently in Ref. [5] that a dually flat structure on the space of the escort probabilities is obtained by applying 1-conformal transformation to the -geometry, which is an information geometry with a constant curvature and related with Tsallis relative entropy [6].We explore the relation between the information geometrical structures associated with this dually flat structure and the Legendre structures in the S formalism. We show that the Legendre dual potential functions in the information geometry with this dually flat structure are the generalized Massieu potential and . We further study the correspondences among the potential functions, dual affine coordinates, and relevant divergence functions between the information geometry of the dually flat structure and S-formalism.[1] T. Wada, A.M. Scarfone, Connections between Tsallis' formalisms employing the standard linear average energy and ones employing the normalized q-average energy, Phys. Lett. A 335 (2005) 351-362.[2] C. Tsallis, Introduction to Nonextensive Statistical Mechanics - Approaching a Complex World (Springer,. New York, 2009).[3] T. Wada, A.M. Scarfone, A non self-referential expression of Tsallis' probability distribution function, Eur. Phys. J. B 47 (2005) 557-561.[4] T. Wada, A.M. Scarfone, The Boltzmann Temperature and Lagrange Multiplier in Nonextensive Thermostatistics, Prog. Theor. Phys. Suppl. 162 (2006) 37-44.[5] A. Ohara, H. Matsuzoe, S-I. Amari, A dually flat structure on the space of escort distributions, J. Phys.: Conf. Series 201 (2010) 012012.[6] A. Ohara, Geometric study for the Legendre duality of generalized entropies and its application to the porous medium equation, Eur. Phys. J. B 70, (2009) 15-28.
Yu Watanabe The University of Tokyo, Japan (joint work with Takahiro Sagawa, Masahito Ueda)Optimal measurement and maximum fisher information on noisy quantum systems
The most serious obstacle against realizing quantum computers and networks is decoherence that acts as a noise and causes information loss. Decoherence occurs when a quantum system interacts with its environment, and it is unavoidable in almost all quantum systems. Therefore, one of the central problems in quantum information science concerns the optimal measurement to retrieve information about the original quantum state from the decohered one and the maximum information that can be obtained from the measurement.We identify an optimal quantum measurement that retrieves the maximum Fisher information about the expectation value of an observable from the partially decohered state. And we also clarify the maximum Fisher information obtained by the optimal measurement.[1] Y. Watanabe, T. Sagawa, and M. Ueda, Phys. Rev. Lett. 104, 020401 (2010).
Le Yang CNRS, FranceRiemannian median and its estimation
In order to characterize statistical data lying on a Riemannian manifold, one often uses the barycenter of empirical data as the notion of centrality. But it is known to all that barycenter is not a robust estimator and is sensitive to outliers. An ideal substitute of the barycenter possessing robustness is the notion of geometric median. In this paper, we define geometric median for a probability measure on a Riemannian manifold, give its characterization and a natural condition to ensure its uniqueness. In order to compute geometric median in practical cases, we also propose a subgradient algorithm and prove its convergence as well as estimating the error of approximation and the rate of convergence. The convergence property of this subgradient algorithm, which is a generalization of the classical Weiszfeld algorithm in Euclidean spaces to the context of Riemannian manifolds, does not depend on the sign of curvatures of the manifold and hence improves a recent result of Fletcher and his colleagues.
The closely related concepts Bures metric, Bures distance, and fidelity (transition probability) can be defined equally well as a ``functor'' within the hierarchy of quantum systems or intrinsically within any quantum system. Some of these possibilities will be discussed in connection with inequalities, the use of ``amplitudes'', and symmetries. We further point to a parallel transport and its natural gauge theory which is related to the Bures geometry.
Basic structural elements on the way to information involve truth, belief and knowledge as well as ideas about how truth is perceived by an observer, be it a physicist or a statistician. Also, one should ask questions such as ``what {\em can\/} we know?'' When formalizing these philosophical considerations, simple two-person zero-sum games provide important theoretical support. Well known results going back to Chentsov and Csisz\'ar on information projections and Pythagorean inequalities may be illuminated this way. Whether a tie to a truly geometric theory as initiated by the Japanese school led by Amari can also be established is another matter.
Jaynes suggested that given random variables $(X_1, \dots , X_n)$ of unknown distribution, and measurements of them, then the best estimate for their joint distribution is given by maximising the entropy of all states, under the condition that they predict that the mean of each, in a distribution, be its exact mean. We show that this can be proved to be the best estimate, in that the scores, $X_i - \langle X_i \rangle$ have the least joint variance under this condition. The result is extended to quantum mechanics: the estimation of n self-adjoint quadratic forms that are Kato-small relative to a given positive self-adjoint operator.
The duality between the channel capacity problem and the min-max problem is well-known. In this presentation, we study a similar mini-max problem for prediction. We consider one parameter probability distribution, employ an Bayesian framework, and study the property of the optimal prior. A dual problem is derived naturally, and the optimal prior is proved to be a discrete distribution.
The classical relative entropy plays a key role in information geometry as a bridge between the differential-geometrical structure of the space of probability distributions and the large deviation problems in probability theory. Unfortunately, this happy situation breaks down in the quantum setting except for some very restricted aspects. We discuss the problem to be addressed mainly in view of statistical inference of quantum states and show some new directions.
A statistical model is essentially an information channel from a parameter space to a data space so that each parameter gives a distribution over possible data. We are interested in characterizing the statistical models that have finite capacity. According to the Gallager-Ryabko Theorem the capacity equals the minimax redundancy. In the minimum description length (MDL) approach to statistics one is interested in the minimax regret rather than the minimax redundancy of the statistical model. The minimax redundancy lower bounds minimax regret so if capacity is infinite the minimax regret is infinite and the MDL approach to statistics fails. In this talk we shall restrict our attention to exponential families. It has been conjectured that finite capacity implies finite minimax regret. We demonstrate that the conjecture holds in 1 dimension but is violated in 3 dimensions. This is joint work with Peter Grünwald.
In this talk, I shall explore the connections of information geometry with various mathematical fields. The Fisher metric is seen as the natural metric on an infinite dimensional projective space. This also yields a geometric interpretation of Green functions (propagators) of quantum field theory. In finite dimensional situations, the Fisher metric induces a pair of dual affine structures. Such a geometry is called K\"ahler affine or Hessian. I shall define a natural differential operator associated to such a structure, the affine Laplacian, and discuss an existence theorem for affine harmonic mappings.
George Galanis Hellenic Naval Academy, GreeceApplications of Information Geometry to environmental numerical prediction systems
A new area where the application of Information Geometric techniques could contribute beneficially is discussed in this work. It concerns the optimization of environmental prediction systems whose added value, in today’s competitive scientific and operational environment, is constantly increasing.The main approach towards accurate environmental forecasting products is today the use of numerical atmospheric and wave prediction systems. Such platforms simulate successfully the general environmental conditions on global or intermediate scale. However, systematic errors emerge when focusing on local characteristics, mainly as a result of the inability to capture sub‐scale phenomena.In this framework the use of assimilation systems – aiming at the improvement of initial conditions – and statistical post processes ‐ for the local adaptation of the forecasts ‐ provide significant support. However, in the majority of such techniques, the available data are recognized as elements of Euclidean spaces and least square methods are employed for the estimation‐optimization of their distances.In the present work the use of Information Geometry in this framework is discussed trying to estimate:The statistical manifold to which the environmental data better fit.The way that the corresponding geodesics could be estimated.In particular, the results of a state‐of‐the art wave prediction model (WAM) have been utilized in an area of significant importance: The North Atlantic Ocean.The obtained outcomes, apart from the clear view that they provide for the wave climate and for the performance of the numerical model, could be exploited for designing new optimization techniques for wave prediction systems that cannot be supported by classical statistics.
Marc Harper UCLA, USAApplications of information geometry in evolutionary dynamics
Many important constructions and definitions in Evolutionary Game Theory can be elegantly interpreted in terms of concepts from information geometry. Information divergences produce new families of replicator-like dynamics, yielding global Lyapunov functions in conjunction with evolutionary stability and analogs of important local results, such as Fisher's fundamental theorem. Using the generalized information divergences recently developed in Statistical Thermodynamics, we can formulate replicator dynamics with type-dependent intensities of selection and many other variations, including the orthogonal projection dynamic. If space permits, I will discuss connections to Bayesian inference and the concept of potential information.Some of this work is discussed on my blog (www.marcallenharper.com/blog/) and in my papers on the ArXiv. My collaborator Chris Lee at UCLA and I hope to publish results relevant to inference in the near future. I am also applying this work in connection with the evolutionary ecology group at UCLA, using Fisher information dynamics to investigate population dynamics. I think this will yield an interesting poster concerning applications of information geometry in evolutionary dynamics and inference.
Masayuki Henmi The Institute of Statistical Mathematics, JapanA dual differential geometrical structure induced from estimating functions
In a parametric statistical inference, the maximum likelihood method is typically used for parameter estimation, and it is nicely explained by the dual differential geometrical structure of a statistical model with Fisher metric and e, m-connections. A good property of this structure is that both of these affine connections are torsion-free, which leads to the important notion of dually flat space. However, this property does not necessarily hold, once we move to another estimation method and consider its associated geometry. In this poster presentation, we investigate a dual differential geometrical structure induced from general estimating functions by utilizing the theory of statistical manifolds admitting torsion, which has been recently developed by Kurose and Matsuzoe. In particular, we focus on the quasi-score (or quasi-likelihood) method and consider the role of the induced geometrical structure in the statistical inference.
Hiroshi Imai Dipartimento di Fisica A. Volta, ItalyA sufficient condition that an operator-sum representation gives horizontal lifting
Finite-dimensional quantum channel is a map defined on positive matrices which describes the quantum process from input quantum state (probability density matrix) to output state. It is a fundamental task to identify quantum channel statistically. For this purpose, quantum SLD Fisher information of a channel is important.This notion is closely related with a fiber structure constructed by operator-sum representations of a given channel and its horizontal lifting.We provide a sufficient condition that a derivative of an operator-sum representation becomes horizontal. As an example, estimation of unknown correlation between several unitary processes satisfying some condition is given.This is a joint work with Chiara Macchiavello and is supported by CORNER project: http://qurope.eu/projects/corner
Dominik Janzing MPI for biological cybernetics, Tuebingen, Germany (joint work with Povilas Daniusis, Joris Moiij, Bastian Steudel, Jakob Zscheischler, Kun Zhang, Bernhard Schölkopf)Inferring causal directions via information geometry
Some recent methods in causal inference use the assumption that the marginal distribution of the cause and the conditional distribution of the effect, given the cause are independent objects ob nature. In other words, the causal hypothesis ``X causes Y'' needs to be rejected if P(X) and P(Y|X) satisfy a relation that ``generic'' pairs (P(X), P(Y|X)) would not satisfy. To formalize such a vague statement is challenging, but we propose a notion of independence that can be interpreted as orthogonality in information space in the sense of information geometry. We developed a method for inferring causal directions based on such an orthogonality assumption and obtained encouraging empirical results.
Jozef Juricek Charles University in Prague, Czech RepublicMaximization of information divergence from the symmetrical exponential families
The problem of maximization of the information divergence from the exponential families symmetrical w.r.t. some permutation group is presented. This work is related to previous results of N. Ay in [1], F. Matúš in [2,3,4] and J. Rauh in [5]. Situations, in which there exist maximizers exchangeable w.r.t. the permutation group, are studied. Then, the dimensionality of the optimization problem is highly reduced. For some special cases explicit solutions are found. Maximization of information divergence from an exponential family has emerged in probabilistic models for evolution and learning in neural networks that are based on infomax principles. The maximizers admit interpretation as stochastic systems with high complexity w.r.t. exponential family [1].A link between divergence maximization and secret sharing was established in [4].Keywords. Kullback-Leibler divergence, relative entropy, exponential family, hierarchical models, information projection, symmetric group, permutation group. AMS 2000 Math. Subject Classification. Primary 94A17. Secondary 62B10, 60A10, 52A20. Contact. E-mail, Charles University in Prague, Department of Probability and Mathematical Statistics, Sokolovská 83, Prague, 186 75, Czech Republic, EU Ay N., Knauf, A. (2006). Maximizing multi-information. Kybernetika, 42:517--538, 2006. Matúš, F. (2004). Maximization of information divergences from binary i.i.d. sequences. Proceedings IPMU 2004, 2:1303--1306, Perugia, Italy. Matúš, F. (2007). Optimality conditions for maximizers of the information divergence from an exponential family. Kybernetika, 43:731--746. Matúš, F. (2009). Divergence from factorizable distributions and matroid representations by partitions. IEEE Transactions on Information Theory, 55(12):5375--5381. Rauh, J. (2009). Finding the maximizers of the information divergence from an exponential family. arXiv:0912.4660.
Takafumi Kanamori Nagoya University, Japan (joint work with Atsumi Ohar)A Bregman extension of quasi-Newton updates
Standard quasi-Newton methods such as BFGS or DFP formula are closely related to the geometrical structure over multivariate normal distributions. In this presentation, first we introduce the relation between the update formula of the Hessian matrix and the Kullback-Leibler divergence over multivariate normal distributions. Then an extension of Hessian update is derived from the Bregman divergence, which is an extension of Kullback-Leibler divergence. Especially, we exploit the Bregman divergence with V-potentials in order to obtain the tractable update formula. Based on our framework, we study convergence property, group-invariance and robustness of the Hessian update formula.
Masanori Kawakita Kyushu university, Japan (joint work with Jun'ichi Takeuchi)Semi-supervised learning in view of geometry of estimating function
We study the asymptotic performance of a certain class of semi-supervised learning methods in view of information geometry for estimating functions. Semi-supervised learning has attracted many researcher's interests. Even though many complicated methods were proposed, only few studies discussed their theoretical performance.Some semi-supervised learning methods can be formulated with estimating functions. We analyze such types of methods in this poster. Amari and Kawanabe (1997) analyzed the information geometric properties of estimating functions. Using their framework, we derive the class of all estimating functions for our problem. We further derive the optimal estimating function. In general, however, the optimal estimating function is not available since it depends on a unknown quantity. We provide a way of constructing a good estimate, which is always available, for the optimal estimating function. In addition, we also mention that a specific class of semi-supervised approach is deeply related to a statistical paradox, which was geometrically analyzed by Henmi and Eguchi (2004).
Kei Kobayashi The Institute of Statistical Mathematics, JapanUsing algebraic method in information geometry
The second or higher order efficient estimators for curved exponential families have been studied in the context of information geometry. However, computation of the estimator and evaluation of the risk have not been discussed well other than sampling approximations as Monte Carlo methods. In this presentation, a class of ``algebraic'' second order efficient estimators for ``algebraic'' curved exponential families is proposed. For this class of estimators, differential forms such as the Fisher metric, affine connections and embedding curvatures can be computed by algebraic computational methods such as Gröbner bases. The efficient estimators, their bias and their risk are evaluated via these differential forms. We demonstrate the effectiveness of algebraic method using some simple example models.
Ryszard Kostecki University of Warsaw, PolandQuantum information geometry and non-commutative flow of weights
Using the non-commutative flow of weights on von Neumann algebras, we propose the definitions of the quantum information geometric notions of -divergence, riemannian metric and affine -connections on the spaces of finite positive normal functionals on von Neumann algebras. This way we generalise the formulation that was provided earlier by Jencova on the base of Araki-Masuda theory of non-commutative spaces. Using our formulation, we define the constrained maximum quantum relative -entropy updating rule and discuss some properties of quantum bayesian inference in this setting.
Michal Kupsa Academy of Sciences of the Czech Republic, Czech Republic (joint work with František Matúš)On colorings of bivariate random sequences
The ergodic sequences consisting of vectors , , over a finite alphabet are colored with colors for and colors for . Generic behavior of probabilities of monochromatic rectangles intersected with typical sets is examined. When n increases a big majority of pairs of colorings produces rectangles whose probabilities are bounded uniformly from above. Bounds are worked out in all regimes of the rates a and b of colorings. As a consequence, generic behavior of Shannon entropies of the partitions into rectangles is described.
Luigi Malagò Politecnico di Milano, Italy (joint work with Matteo Matteucci, Giovanni Pistone)Optimization of pseudo-Boolean functions based on the exponential family relaxation
Pseudo-Boolean functions are real-valued functions defined over a vector of binary variables. They appear in many different fields and are well studied in integer programming and in combinatorial optimization. The optimization of this class of functions is of particular interest, since it is NP-hard in the general formulation and no exact polynomial-time algorithm is available. Often in the literature pseudo-Boolean function optimization is referred as 0/1 programming.We analyze the problem of pseudo-Boolean functions optimization by introducing the notion of stochastic relaxation, i.e., we look for the minima of a pseudo-Boolean function by minimizing its expected value over a set of probability densities. By doing this, we move from a discrete optimization problem to a continuous one, where the parameters of the statistical model become the new variables of the optimization problem. We choose statistical models that belong to the exponential family, and we justify this choice with results about the characterization of its topological closure and of its tangent space. Indeed, we are looking for minimizing sequences of densities in the model that converge towards distributions with reduced support concentrated on the minima of the pseudo-Boolean function. Such limit distributions do not belong to the exponential model, so it becomes important, given an exponential model, to determine which densities are included in its closure. Similarly, we are interested in the characterization of the tangent space of an exponential family, since in each point we are looking for the direction of maximum decrement of the expected value of the original function. Under a proper choice of the sufficient statistics of the exponential family used in the relaxation, the curve of maximum decrement is an exponential family itself. We provide some results about the existence of critical points of the relaxed function, in terms of the relation between the expansion of the pseudo-Boolean function and the sufficient statistics of the exponential model. The presence of stationary points which correspond to saddle points may determine the existence of different local minima, to which a minimizing sequence of densities may converge.The analysis developed leads to the proposal of a new algorithm for pseudo-Boolean functions optimization based on stochastic gradient descent, for which we provide preliminary experimental results. The algorithm is in principle similar to some other techniques that have been proposed recently in the literature, often referred as population based algorithm, since at each iteration a pool of feasible candidate solutions is generated by sampling from a statistical model. Such algorithms are known in the Evolutionary Computation literature as Estimation of Distribution Algorithms (EDAs), and a similar approach appears also in stochastic optimization under the name of Cross-Entropy method. By taking inspiration from the EDA meta-heuristic and leveraging on the properties of the exponential model, we can design an algorithm that updates explicitly the model parameters in the direction of the gradient of the expected value of a pseudo-Boolean function, instead of estimating the value of the parameters of the model from a subset of the current sample of feasible solutions, as in most of the EDAs described in the literature. The gradient of the expected value of a function defined over the sample space, with respect to an exponential family, can be evaluated in terms of covariances, but since these evaluations require a summation over the entire search space, we propose to replace them with empirical covariances, to be estimated from the current sample. We implemented a vanilla version of the algorithm to find the ground states of some instances of a 2D spin glass model. The sufficient statistics of the exponential family have been determined according to the lattice structure of the spin glass model, such that all the monomials in the energy function correspond to a sufficient statistics of the model. We compared the performance of our algorithm with the state of the art algorithms in the Evolutionary Computation literature, to solve 2D Ising spin glass problems. We run multiple instances of the algorithms, for different sizes of the lattice, 8x8, 10x10, and 20x20, respectively.Preliminary experimental results are encouraging and compare favourably with other recent heuristics proposed in the literature. Since we deal with a sample size which is much small than the cardinality of the sample space, the estimation of the covariances is affected by large noise.For this reason it seems convenient to replace empirical covariance estimation with other techniques which proved to be able to provide more accurate estimation, such as shrinkage approach to large-scale covariance matrix estimation. Such methods offer robust estimation techniques with computational complexity which is often no more that twice that required for empirical covariance estimation.Moreover, the algorithm can also be applied in the black box optimization context, by incorporating in the estimation procedure some model building techniques able to learn from the sample a set of statistically significant correlations among the variables in the original function. Since often in real world problems we deal with sparse problems, i.e., each variable interact with a restricted number of variables, l1-regularized logistic regression methods for high-dimensional model selection techniques seem to provide valuable tools in this context. The algorithm we proposed is highly parallelizable, both in the estimation of covariances and in the sampling step. The final aim is to develop an efficient and effective approach to adaptively solve very large pseudo-Boolean problems also in the black-box context for which the interaction structure among the variable is unknown.
Keiji Matsumoto National Institute of Informatics, JapanMonotone "metric" in the channel space: decision theoretic approach
The aim of the talk is to characterize monotone `metric' in the space of markov map. Here, `metric' means the square of the norm defined on the tangent space, and not necessarily equals the inner product of the vector with itself, different from usual notion of metric used in differential geometry. (Hereafter, this property, that the norm is induced from an inner innerproduct, is called inner-product-assumption.) So far, there have been plenty of literatures on the metric in the space of probability distributions and quantum states. Cencov, sometime in 1970s, proved the monotone metric in probability distribution space is unique up to constant multiple, and identical to Fisher information metric Cencov. Amari and others independently worked on the same object, especially from differential geometrical view points, and applied to number of problems in information sciences. Quantum mechanical states are discussed by Nagaoka, Fujiwara, Matsumoto and Petz. Among them, Petz characterized all the monotone metrics in the quantum state space using operator mean theory.As for channels, however, only a little is known about its geometrical structures. To my knowledge, there had been no study about axiomatic characterization of distance measures in the classical or quantum channel space. First, we show the upper and the lower bound of monotone channel "metric", and it is proved that any monotone "metric" cannot satisfy the inner-product-assumption. We give counter examples in the space of binary channels. The proof utilizes "local" version of Blackwell's randomization criteria for equivalence of statistical models, which is well known in statistical decision theory.The latter result has some impact on the axiomization of the monotone metric in the space of classical and quantum states, since both Cencov and Petz rely on the inner-product-assumption. Since classical and quantum states can be viewed as channels with the constant output, it is preferable to dispense with the inner-product-assumption. Recalling that the Fisher information is useful in asymptotic theory, it would be natural to introduce some assumptions on asymptotic behaviour. Hence, we introduced weak asymptotic additivity and lower asymptotic continuity. By these additional assumptions, we not only recovers uniqueness result of Cencov, but also proves uniqueness of the monotone `metric' in the channel space. It is known that this unique "metric" gives maximum information in estimating unknown channels. In this proof, again, we used the local and asymptotic version of randomization criteria. In the end, there is an implication on quantum state metrics. A quantum state can be viewed as a classical channel which takes a measurement as an input, and outputs measurement a result. If we restrict the measurement to separable measurement, the asymptotic theory discussed in our paper can be applied to quantum states also, proving the uniqueness of the metric. On the other hand, the author's past manuscript had reestablished the upper and the lower bound of the monotone metric by Petz, without relying on the inner-product-assumption. This suggests the monotone `metric' in the quantum state space is not unique. Therefore, having collective measurement is essential to have a variety of monotone metrics.
Guido Montùfar Max Planck Institute for Mathematics in the Sciences, GermanyFaces of the probability simplex contained in the closure of an exponential family and minimal mixture representations
This work is about subsets of a state-space with the following property (S): All probability distributions supported therein are elements of the closure of a given exponential family. There exists an optimal cardinality condition for sets of the state-space which ensures they have the property S. However, this is not a necessary condition, and the sets can be considerably larger. We present a characterization of S and use it to compute lower bounds on the maximal cardinality of sets with S. Furthermore we show that there are actions on the state-space which preserve the property S. These results are applied to find bounds on the minimal number of elements belonging to a certain exponential family forming a mixture representation of a probability distribution which belongs to another (larger) exponential family.
Mariela Portesi CONICET & Universidad Nacional de La Plata, Argentina (joint work with Fernando Montani)Statistical modeling of neuronal activity for an infinitely large number of neurons
An important open question in mathematical neuroscience is how to evaluate the significance of high order spike correlations in the neural code through analytically solvable models. We investigate the thermodynamic limit of a widespread probability distribution of firing in a neuronal pool, within the information geometry framework, considering all possible contributions from high order correlations. This allows us to identify a deformation parameter accounting for the different regimes of firing within the probability distribution, and to investigate whether those regimes could saturate or increase information as the number of neurons goes to infinity.S. Amari, H. Nakahara, S. Wu and Y. Sakai, Neural Comput. 15, 127 (2003)B.B. Averbeck, P.E. Latham and A. Pouget, Nat. Rev. Neurosci. 7, 358 (2006)F. Montani, A. Kohn, A. Smith and S.R. Schultz, J. Neurosci. 27, 2338 (2007)F. Montani, R.A.A. Ince, R. Senatore, E. Arabzadeh, M.E. Diamond and S. Panzeri, Phil. Trans. R. Soc. 367, 3297 (2009)
Johannes Rauh Max Planck Institute for Mathematics in the Sciences, GermanySupport sets in exponential families and oriented matroid theory
My poster presents results from a joint work with Nihat Ay and Thomas Kahle (preprint available at arXiv:0906.5462). We study how results from algebraic statistics generalize to the case of non-algebraic exponential families on finite sets. Here an exponential family is called algebraic if it has an integer-valued matrix of sufficient statistics. In this case the exponential family is the intersection of an algebraic variety with the probability simplex, which makes available the powerful tools of computational commutative algebra. While most relevant examples of exponential families are algebraic, it turns out that ignoring the algebraic properties yields another viewpoint which better captures the continuity aspects of exponential families.A lot of properties can be deduced from an oriented matroid naturally associated to the exponential family: The closure of a discrete exponential family is described by a finite set of equations corresponding to the circuits of this matroid. These equations are similar to the equations used in algebraic statistics, although they need not be polynomial in the general case. This description allows for a combinatorial study of the possible support sets in the closure of an exponential family. In particular, if two exponential families induce the same oriented matroid, then their closures have the same support sets.Finally we find a surjective (but not injective) parametrization of the closure of the exponential family by adding more parameters. These parameters also have an interpretation in the matroid picture. The parametrization generalizes the monomial parametrization used in algebraic statistics in the case of algebraic exponential families.
Shigeru Shuto Osaka University, JapanInformation geometry of renormalization on diamond fractal Ising spins
The renormalization group procedure to calculate a partition function in statistical mechanics, is considered as the successive approximation to the canonical distribution on its state space. By embedding renormalized states into the original state space, this approximation is characterized as the m-projection from the canonical distribution onto a renormalization submanifold. We apply this method for a diamond fractal Ising spin model, whose renormalization flow has fixed points in the finite region.
Takashi Takenouchi Nara Institute of Science and Technology, JapanBayesian decoder for multi-class classification by mixture of divergence
Multi-class classification problem is one of the major topic in the fields of machine learning. There are many works on the topic, and one major approach considers a decomposition of the multi-class problem into multiple binary classification problems based on the framework of error correcting output coding (ECOC). Each decomposed binary problem is independently solved and results of binary classifiers are integrated (decoded) for a prediction of multi-class label.In this research, we present a new integration method of binary classifiers for multi-class problem. Our integration method (decoder) is characterized by a minimization of sum of divergences, in which each divergence measures diversity between the decoder and a posterior distribution of the class label associated with a binary classifier.We investigate performance of the proposed method using a synthetic dataset, datasets from the UCI repository.
Tatsuaki Wada Ibaraki University, Japan (joint work with Atsumi Ohara)Legendre duality and dually-flat structure in nonextensive thermostatistics developed by S2-q formalism
S-formalism [1] in the generalized thermostatistics based on Tsallis entropy S [2] is a natural formalism in the sense that the associated Legendre structures are derived in a similar way as in the standard thermostatistics. From a q-exponential probability distribution function (pdf), which maximizes S under the constraint of linear average energy U, the so-called escort pdf is naturally appeared in this formalism. The generalized Massieu potential associated with S and U is related to the one associated with the normalized Tsallis entropy S and the normalized q-average energy U, which is the energy-average w.r.t. the escort pdf. The S formalism has also provided the connections among some different versions of Tsallis nonextensive thermostatistics, a non self-referential expression of Tsallis’ pdf [3], and the relation between the Boltzmann temperature and the Lagrange multiplier in nonextensive thermostatistics [4].On the other hand, it is shown recently in Ref. [5] that a dually flat structure on the space of the escort probabilities is obtained by applying 1-conformal transformation to the -geometry, which is an information geometry with a constant curvature and related with Tsallis relative entropy [6].We explore the relation between the information geometrical structures associated with this dually flat structure and the Legendre structures in the S formalism. We show that the Legendre dual potential functions in the information geometry with this dually flat structure are the generalized Massieu potential and . We further study the correspondences among the potential functions, dual affine coordinates, and relevant divergence functions between the information geometry of the dually flat structure and S-formalism.[1] T. Wada, A.M. Scarfone, Connections between Tsallis' formalisms employing the standard linear average energy and ones employing the normalized q-average energy, Phys. Lett. A 335 (2005) 351-362.[2] C. Tsallis, Introduction to Nonextensive Statistical Mechanics - Approaching a Complex World (Springer,. New York, 2009).[3] T. Wada, A.M. Scarfone, A non self-referential expression of Tsallis' probability distribution function, Eur. Phys. J. B 47 (2005) 557-561.[4] T. Wada, A.M. Scarfone, The Boltzmann Temperature and Lagrange Multiplier in Nonextensive Thermostatistics, Prog. Theor. Phys. Suppl. 162 (2006) 37-44.[5] A. Ohara, H. Matsuzoe, S-I. Amari, A dually flat structure on the space of escort distributions, J. Phys.: Conf. Series 201 (2010) 012012.[6] A. Ohara, Geometric study for the Legendre duality of generalized entropies and its application to the porous medium equation, Eur. Phys. J. B 70, (2009) 15-28.
Yu Watanabe The University of Tokyo, Japan (joint work with Takahiro Sagawa, Masahito Ueda)Optimal measurement and maximum fisher information on noisy quantum systems
The most serious obstacle against realizing quantum computers and networks is decoherence that acts as a noise and causes information loss. Decoherence occurs when a quantum system interacts with its environment, and it is unavoidable in almost all quantum systems. Therefore, one of the central problems in quantum information science concerns the optimal measurement to retrieve information about the original quantum state from the decohered one and the maximum information that can be obtained from the measurement.We identify an optimal quantum measurement that retrieves the maximum Fisher information about the expectation value of an observable from the partially decohered state. And we also clarify the maximum Fisher information obtained by the optimal measurement.[1] Y. Watanabe, T. Sagawa, and M. Ueda, Phys. Rev. Lett. 104, 020401 (2010).
Le Yang CNRS, FranceRiemannian median and its estimation
In order to characterize statistical data lying on a Riemannian manifold, one often uses the barycenter of empirical data as the notion of centrality. But it is known to all that barycenter is not a robust estimator and is sensitive to outliers. An ideal substitute of the barycenter possessing robustness is the notion of geometric median. In this paper, we define geometric median for a probability measure on a Riemannian manifold, give its characterization and a natural condition to ensure its uniqueness. In order to compute geometric median in practical cases, we also propose a subgradient algorithm and prove its convergence as well as estimating the error of approximation and the rate of convergence. The convergence property of this subgradient algorithm, which is a generalization of the classical Weiszfeld algorithm in Euclidean spaces to the context of Riemannian manifolds, does not depend on the sign of curvatures of the manifold and hence improves a recent result of Fletcher and his colleagues.
The organizing and scientific committees have to announce with much regret that Dr. Igor Vajda, an invited speaker of IGAIA III, passed suddenly away after a short illness on May 2, 2010. We will miss him as a great scientist, dear colleague and friend. I. Csiszár agreed to deliver a lecture to his memory.
Van Trees's inequality is a Bayesian version of the Cramer-Rao inequality for quadratic loss of estimators with values in vector spaces. The first part of the talk presents a generalisation of this inequality to the setting of smooth loss functions and estimators with values in manifolds. Various geometric objects (connections, metrics, tensors) play a role.
Stam's inequality compares the (inverse) Fisher information of the sum of two independent (real-valued) random variables with the (inverse) Fisher informations of these variables. The second part of the talk describes a generalisation of this inequality to the setting of random variables on Lie groups.
This is a joint work with D.\ Petz and a continuation of the paper in LAA {\bf 430} (2009) with the same title. The $n\times n$ Hermitian matrices form the $n^2$-dimensional Euclidean space with respect to Hilbert-Schmidt inner product. The set $\mathbb{P}_n$ of all $n\times n$ positive definite matrices, being an open subset of $\mathbb{H}_n$, is naturally equipped with a $C^\infty$ manifold structure. An smooth kernel function $\phi:(0,\infty)\times(0,\infty)\to(0,\infty)$ induces a Riemannian metric $K^\phi$ on $\mathbb{P}_n$ defined by $$ K_D^\phi(H,K):=\sum_{i,j}\phi(\lambda_i,\lambda_j)^{-1}\mathrm{Tr}\, P_iHP_jK, \qquad D\in\mathbb{P}_n,\ H,K\in\mathbb{H}_n, $$ where $D=\sum_i\lambda_iP_i$ is the spectral decomposition. Certain mportant quantities in quantum information geometry, such as statistical metric, quantum Fisher informations, and quantum variances, are Riemannian metrics arising from kernel functions $\phi$ of the form $M(x,y)^\theta$, a $\theta$ ($\in\mathbb{R}$)-power of a symmetric homogenous mean $M(x,y)$ of $x,y>0$. We discuss the following topics concerning geodesic shortest curves and geodesic distance of Riemannian metrics on $\mathbb{P}_n$ of this type.(1) Since the Riemannian manifold $(\mathbb{P}_n,K^\phi)$ with $\phi=M(x,y)^\theta$ is complete if and only if $\theta=2$, the existence of geodesic shortest curves in the case $\theta\ne2$ does not seem obvious. When $A,B$ are commuting, we present an explicit formula of a geodesic shortest curve between $A,B$ that is depending on $\theta$ but independent of the choice of $M$. Moreover, we show the existence of a geodesic shortest curve joining $A,B\in\mathbb{P}_n$ for the metric $K^\phi$ with $\phi=M(x,y)^\theta$ if $\theta$ is sufficiently near $2$.(2) We present a necessary and sufficient condition for Riemannian metrics $K^\phi$ and $K^\psi$ induced by $\phi=M(x,y)^\theta$ and $\psi=N(x,y)^\kappa$ to be isometric under the transformation $D\in\mathbb{P}_n\mapsto F(D)\in\mathbb{P}_n$ given by a smooth function $F:(0,\infty)\to(0,\infty)$. The condition is explicitly given in terms of $M,N,\theta$, and $\kappa$.(3) From the above (2) we can construct a one-parameter isometric family of Riemannian metrics starting from any $K^\psi$ inside the set of Riemannian metrics we are treating. Those isometric families have different features between the cases $\kappa\ne2$ and $\kappa=2$. We see that each of those families converges to the metric $K^{M_{\mathrm{L}}^2}$ induced by the square of the logarithmic mean $M_{\mathrm{L}}(x,y):=(x-y)/(\log x-\log y)$. Thus $K^{M_{\mathrm{L}}^2}$ may be regarded as a (unique) attractor in the set of all Riemannian metrics of our discussion. From the fact that the geodesic shortest curve for the metric $K^{M_{\mathrm{L}}^2}$ is $\gamma(t)=\exp((1-t)\log A+\log B)$ ($0\le t\le1$), this shows the Riemannian geometric interpretation for the limit formulas such as \begin{align*} \lim_{\alpha\to0}((1-t)A^\alpha+tB^\alpha)^{1/\alpha}&=\exp((1-t)\log A+\log B), \\ \lim_{\alpha\to0}(A^\alpha\,\#_t\,B^\alpha)^{1/\alpha}&=\exp((1-t)\log A+\log B), \end{align*} where $\#_t$ is the $t$-power mean ($0\le t\le1$).
The subject is a mathematical transition from the Fisher information of classical statistics to the matrix formalism of quantum theory. If the monotonicity is the main requirement, then there are several quantum versions parametrized by a function. In physical applications the minimal is the most popular. There is a one-to-one correspondence between Fisher informations and abstract covariances. The skew information is treated here as a particular case.
Joint work with Karim Anaya-Izquierdo and Frank Critchley (Open University) and Paul Vos (East Carolina).
Our project uses computational information geometry to develop diagnostic tools to help understand sensitivity to model choice by building an appropriate perturbation space for a given inference problem. Focused on generalised linear models, the long-term aim is to engage with statistical practice via appropriate R software encapsulating the underlying geometry, whose details the user may ignore.
This talk exploits the universality of the extended multinomial model to examine the interaction between prior considerations (scientific and otherwise), model choice and final inference. A range of examples are used for illustration. There are strong points of contact with the work of Copas and Eguchi.
EPSRC support under grant EP/E017878/1 is gratefully acknowledged.
Applications of information geometry to black hole physics are discussed. We focus mainly on the outcomes of this research program. The type of information geometry we utilize in this approach is the Ruppeiner geometry defined on the state space of a given thermodynamic system in equilibrium. The Ruppeiner geometry can be used to analyze stability and critical phenomena in black hole physics with results consistent with those obtained by the Poincare stability analysis for black holes and black rings. Furthermore other physical phenomena are well encoded in the Ruppeiner metric such as the sign of specific heat and the extremality of the solutions. The information geometric approach has opened up new perspectives on the statistical mechanics of black holes - an unsettled subject necessary for the emerging theory of quantum gravity. We discuss in detail the use of information geometry for addressing ultraspinning phases of the (higher-dimensional) Myers-Perry (MP) black holes. We conjecture that the membrane phase of ultraspinning MP black holes is reached at the minimum temperature in the case of $2n < d - 3$ (with $n$ the number of angular momenta and $d$ the number of dimensions), which corresponds to the singularity of the Ruppeiner metric.
Joint work with Karim Anaya-Izquierdo (Open University), Paul Marriott (University of Waterloo) and Paul Vos (East Carolina).
Our project uses computational information geometry to develop diagnostic tools to help understand sensitivity to model choice by building an appropriate perturbation space for a given inference problem. Focused on generalised linear models, the long-term aim is to engage with statistical practice via appropriate R software encapsulating the underlying geometry, whose details the user may ignore.
This talk focuses on the role played by the concept of an approximate cut. Amongst other features, the perturbation space thereby constructed allows us to expose the differences in inferences resulting from different models, albeit that they may be equally well supported by the data. A running example illustrates the development.
EPSRC support under grant EP/E017878/1 is gratefully acknowledged.
The purpose of this talk is two-fold. First, we present a concise review on correlations (both classical and quantum) in bipartite quantum states, with emphasis on their classification, characterization, comparison and quantification. Second, we report some recent results on no-broadcasting for quantum correlations, and provide a unified picture for several celebrated quantum no-broadcasting theorems.
Crudely stated, a central issue in computational neuroscience is to gain insight into how interactions of neural activities carry and process, or encode and decode, information of the outside world. Once neural activities are considered samples of statistical variables, it becomes obvious that research into this neural coding has various connections with the issues investigated by information geometry, and it thus benefits from the perspectives and tools of information geometry. In my talk, I first provide a general overview, to introduce the audience to this subject, and then present some of our relevant work. For example, several previous studies have suggested that a pairwise interaction model (equivalent to so-called Ising models) is sufficient for describing neural activity patterns. In contrast, we recently demonstrated that a hierarchical model of the pairwise models on different scales is more accurate for describing neural data, despite its relative parsimony compared with ordinary pairwise models. The hierarchical model embeds higher-than-pairwise interactions as constraints and has an interesting relation to the generalized Pythagorean theorem or decompositions of different order interactions of discrete variables.
My talk is composed by the following topics. Sec.1: Schrödinger type uncertainty relation for mixed states (based on [1].) We shall give the Schrödinger type uncertainty relation for a quantity representing a quantum uncertainty, introduced by S.Luo in [2]. Our result improves the Heisenberg uncertainty type relation shown in [2] for a mixed state. We also discuss the relation between our result and the original Schrödinger uncertainty relation.Sec.2: A matrix trace inequality and its application to entropy theory (based on [3].) We here give a complete and affirmative answer to a conjecture [4] on matrix trace inequalities for the sum of positive semidefinite matrices. We also apply the obtained inequality to derive a kind of generalized Golden-Thompson inequality for positive semidefinite matrices. Finally, we give a lower bound of the generalized relative entropy (Tsallis relative entropy [5, 6]) applying a slightly different variational expression [7, 8] and the generalized Golden-Thompson inequality.Sec.3: Trace inequalities related to skew informations (based on [9].) (The talk of this section will be given, if we have an enough time.) We study some trace inequalities of the products of the matrices and the power of matrices, which are natural generalized forms related to the quantities constituting skew informations. See [10] for the similar problems and their answers. S.Furuichi, Schrödinger uncertainty relation for mixed states, arXiv:1005.2655v1.S.Luo, Heisenberg uncertainty relation for mixed states, Phys.Rev.A,Vol.72(2005), 042110.S.Furuichi and M.Lin, A matrix trace inequality and its application, to appear in Linear Alg. Appl.S.Furuichi, A mathematical review of the generalized entropies and their matrix trace inequalities, in: Proceedings of WEC2007, 2007, pp.840-845.C. Tsallis et al. In: S. Abe and Y. Okamoto, Editors, Nonextensive Statistical Mechanics and its Applications, Springer-Verlag, Heidelberg (2001). See also the comprehensive list of references at http://tsallis.cat.cbpf.br/biblio.htm.C.Tsallis, Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World, Springer, 2009.F.Hiai and D.Petz, The Golden-Thompson trace inequality is complemented, Linear Alg. Appl. Vol. 181 (1993), pp.153-185.S.Furuichi, Trace inequalities in nonextensive statistical mechanics, Linear Alg. Appl., Vol.418(2006), pp.821-827S.Furuichi, K.Kuriyama and K.Yanagi, Trace inequalities for products of matrices, Linear Alg. Appl., Vol.430(2009),pp.2271-2276T. Ando, F. Hiai and K. Okubo, Trace inequalities for multiple products of two matrices, Math. Inequal. Appl. Vol.3(2000), pp.307-318
The last two decades have seen the birth and boost of a new category of supervised learning algorithms, known as boosting algorithms. This family has gradually appeared as much more pervasive than initially expected, with applications to the induction of virtually any kind of classifier. While the first lenses used to understand the algorithms were essentially grounded in convex optimization, they have been more recently completed by results in information geometry, escaping the traditional Riemannian framework, that help to get a more complete picture of these fascinating algorithmic machineries. The aim of the talk is to present the central geometric part of this picture, which we believe may serve to design more easily new and more efficient boosting algorithms.
Given a manifold $M$, a divergence function $D$ is a non-negative function on the product manifold $M \times M$ that achieves its global minimum of zero (with semi-positive definite Hessian) at those points that form its diagonal submanifold $M_x$. It is well-known that the statistical structure on $M$ (a Riemmanian metric with a pair of conjugate affine connections) can be constructed from the second and third derivatives of $D$ evaluated at $M_x$. In Zhang (2004) and subsequent work, a framework based on convex analysis is proposed to unify familiar families of divergence functions. The resulting geometry, which displays what is called ``reference-representation biduality'', completely captures the alpha-structure (i.e., statistical structure with a parametric family of conjugate connections) of the classical information geometry. This is the alpha-Hessian geometry with equi-affine structure. Here, we continue this investigation in two parallel fronts, namely, how $D$ on $M \times M$ a) is related to various Minkowski metrics on $M$; and b) generates a symplectic structure of $M$. On point a, a set of inequalities are developed that uniformly bounds $D$ by Minkowski distances on $M$. On point b, convex-induced divergence functions will be shown to generate a K\"ahler structure under which the statistical structure of $M$ can be modeled.
I will discuss recent work considering the issue of how Fisher Information should be defined on the integers, and on the set $\{0, 1, \dots, n\}$, motivated by the maximum entropy property of the Poisson and binomial distributions. I will show that the resulting information measures have useful properties which parallel those of the Fisher information with respect to a real location parameter, suggesting that they too can form a basis for Information Geometry, suggesting a form for geodesics related to transportation problems.(This talk is based on joint work with Harremoës, Hillion, Kontoyiannis, Madiman and Yu).
We can consider a Hesse manifold a flat statistical manifold. In this talk we show that a level surface of the non-degenerate function on a Hesse manifold is a 1-conformally flat statistical submanifold. In addition we talk about conditions that a foliation of 1-conformally flat statistical manifolds generates a Hesse manifold.
I discuss a family of distances on the probability measures based on positive definite kernels and the associated reproducing kernel Hilbert spaces (RKHS). It is shown that, with an appropriate choice of a RKHS, the mean of the kernel in RKHS with respect to a probability uniquely determines the probability. With RKHS of this characteristic property, a distance on the probabilities can be defined as the distances of the means. This type of distances provides straightforward estimators with finite sample, unlike some other well-known distances on probabilities. Some statistical asymptotic results on the estimators are discussed. It is also easy to derive a dependence measure based on the distance by considering the distance between the joint probability and the product of the marginals. I will discuss a normalized dependence measure with positive definite kernel, and show an interesting relation with the conventional chi-square divergence.
In this talk, starting from results by Carinena et al. [Ann. Phys. 322, 434, 2007] about the quantum harmonic oscillator on constant (positive or negative) curvature surfaces, I will show some properties of the orthogonal polynomials associated with the corresponding wavefunctions. These polynomials have a strong connection with the hyperspherical polynomials, from which they inherit some properties. Moreover, a geometric transformation between the cases of a positive and a negative curvature surface can be made explicit: this transformation can be given an algebraic interpretation in terms of these orthogonal polynomials. Finally, a link can be exhibited with the canonical probability measures involved in the theory of nonextensive statistics.
A statistical manifold is a differential geometric structure that is induced naturally from geometry of probability distributions. A statistical manifold admitting torsion is a quantum version of the notion of statistical manifolds. In this talk, we discuss geometry of statistical manifolds admitting torsion. We also discuss relations between geometry of statistical manifolds admitting torsion and geometry of affine distributions.
George Galanis Hellenic Naval Academy, GreeceApplications of Information Geometry to environmental numerical prediction systems
A new area where the application of Information Geometric techniques could contribute beneficially is discussed in this work. It concerns the optimization of environmental prediction systems whose added value, in today’s competitive scientific and operational environment, is constantly increasing.The main approach towards accurate environmental forecasting products is today the use of numerical atmospheric and wave prediction systems. Such platforms simulate successfully the general environmental conditions on global or intermediate scale. However, systematic errors emerge when focusing on local characteristics, mainly as a result of the inability to capture sub‐scale phenomena.In this framework the use of assimilation systems – aiming at the improvement of initial conditions – and statistical post processes ‐ for the local adaptation of the forecasts ‐ provide significant support. However, in the majority of such techniques, the available data are recognized as elements of Euclidean spaces and least square methods are employed for the estimation‐optimization of their distances.In the present work the use of Information Geometry in this framework is discussed trying to estimate:The statistical manifold to which the environmental data better fit.The way that the corresponding geodesics could be estimated.In particular, the results of a state‐of‐the art wave prediction model (WAM) have been utilized in an area of significant importance: The North Atlantic Ocean.The obtained outcomes, apart from the clear view that they provide for the wave climate and for the performance of the numerical model, could be exploited for designing new optimization techniques for wave prediction systems that cannot be supported by classical statistics.
Marc Harper UCLA, USAApplications of information geometry in evolutionary dynamics
Many important constructions and definitions in Evolutionary Game Theory can be elegantly interpreted in terms of concepts from information geometry. Information divergences produce new families of replicator-like dynamics, yielding global Lyapunov functions in conjunction with evolutionary stability and analogs of important local results, such as Fisher's fundamental theorem. Using the generalized information divergences recently developed in Statistical Thermodynamics, we can formulate replicator dynamics with type-dependent intensities of selection and many other variations, including the orthogonal projection dynamic. If space permits, I will discuss connections to Bayesian inference and the concept of potential information.Some of this work is discussed on my blog (www.marcallenharper.com/blog/) and in my papers on the ArXiv. My collaborator Chris Lee at UCLA and I hope to publish results relevant to inference in the near future. I am also applying this work in connection with the evolutionary ecology group at UCLA, using Fisher information dynamics to investigate population dynamics. I think this will yield an interesting poster concerning applications of information geometry in evolutionary dynamics and inference.
Masayuki Henmi The Institute of Statistical Mathematics, JapanA dual differential geometrical structure induced from estimating functions
In a parametric statistical inference, the maximum likelihood method is typically used for parameter estimation, and it is nicely explained by the dual differential geometrical structure of a statistical model with Fisher metric and e, m-connections. A good property of this structure is that both of these affine connections are torsion-free, which leads to the important notion of dually flat space. However, this property does not necessarily hold, once we move to another estimation method and consider its associated geometry. In this poster presentation, we investigate a dual differential geometrical structure induced from general estimating functions by utilizing the theory of statistical manifolds admitting torsion, which has been recently developed by Kurose and Matsuzoe. In particular, we focus on the quasi-score (or quasi-likelihood) method and consider the role of the induced geometrical structure in the statistical inference.
Hiroshi Imai Dipartimento di Fisica A. Volta, ItalyA sufficient condition that an operator-sum representation gives horizontal lifting
Finite-dimensional quantum channel is a map defined on positive matrices which describes the quantum process from input quantum state (probability density matrix) to output state. It is a fundamental task to identify quantum channel statistically. For this purpose, quantum SLD Fisher information of a channel is important.This notion is closely related with a fiber structure constructed by operator-sum representations of a given channel and its horizontal lifting.We provide a sufficient condition that a derivative of an operator-sum representation becomes horizontal. As an example, estimation of unknown correlation between several unitary processes satisfying some condition is given.This is a joint work with Chiara Macchiavello and is supported by CORNER project: http://qurope.eu/projects/corner
Dominik Janzing MPI for biological cybernetics, Tuebingen, Germany (joint work with Povilas Daniusis, Joris Moiij, Bastian Steudel, Jakob Zscheischler, Kun Zhang, Bernhard Schölkopf)Inferring causal directions via information geometry
Some recent methods in causal inference use the assumption that the marginal distribution of the cause and the conditional distribution of the effect, given the cause are independent objects ob nature. In other words, the causal hypothesis ``X causes Y'' needs to be rejected if P(X) and P(Y|X) satisfy a relation that ``generic'' pairs (P(X), P(Y|X)) would not satisfy. To formalize such a vague statement is challenging, but we propose a notion of independence that can be interpreted as orthogonality in information space in the sense of information geometry. We developed a method for inferring causal directions based on such an orthogonality assumption and obtained encouraging empirical results.
Jozef Juricek Charles University in Prague, Czech RepublicMaximization of information divergence from the symmetrical exponential families
The problem of maximization of the information divergence from the exponential families symmetrical w.r.t. some permutation group is presented. This work is related to previous results of N. Ay in [1], F. Matúš in [2,3,4] and J. Rauh in [5]. Situations, in which there exist maximizers exchangeable w.r.t. the permutation group, are studied. Then, the dimensionality of the optimization problem is highly reduced. For some special cases explicit solutions are found. Maximization of information divergence from an exponential family has emerged in probabilistic models for evolution and learning in neural networks that are based on infomax principles. The maximizers admit interpretation as stochastic systems with high complexity w.r.t. exponential family [1].A link between divergence maximization and secret sharing was established in [4].Keywords. Kullback-Leibler divergence, relative entropy, exponential family, hierarchical models, information projection, symmetric group, permutation group. AMS 2000 Math. Subject Classification. Primary 94A17. Secondary 62B10, 60A10, 52A20. Contact. E-mail, Charles University in Prague, Department of Probability and Mathematical Statistics, Sokolovská 83, Prague, 186 75, Czech Republic, EU Ay N., Knauf, A. (2006). Maximizing multi-information. Kybernetika, 42:517--538, 2006. Matúš, F. (2004). Maximization of information divergences from binary i.i.d. sequences. Proceedings IPMU 2004, 2:1303--1306, Perugia, Italy. Matúš, F. (2007). Optimality conditions for maximizers of the information divergence from an exponential family. Kybernetika, 43:731--746. Matúš, F. (2009). Divergence from factorizable distributions and matroid representations by partitions. IEEE Transactions on Information Theory, 55(12):5375--5381. Rauh, J. (2009). Finding the maximizers of the information divergence from an exponential family. arXiv:0912.4660.
Takafumi Kanamori Nagoya University, Japan (joint work with Atsumi Ohar)A Bregman extension of quasi-Newton updates
Standard quasi-Newton methods such as BFGS or DFP formula are closely related to the geometrical structure over multivariate normal distributions. In this presentation, first we introduce the relation between the update formula of the Hessian matrix and the Kullback-Leibler divergence over multivariate normal distributions. Then an extension of Hessian update is derived from the Bregman divergence, which is an extension of Kullback-Leibler divergence. Especially, we exploit the Bregman divergence with V-potentials in order to obtain the tractable update formula. Based on our framework, we study convergence property, group-invariance and robustness of the Hessian update formula.
Masanori Kawakita Kyushu university, Japan (joint work with Jun'ichi Takeuchi)Semi-supervised learning in view of geometry of estimating function
We study the asymptotic performance of a certain class of semi-supervised learning methods in view of information geometry for estimating functions. Semi-supervised learning has attracted many researcher's interests. Even though many complicated methods were proposed, only few studies discussed their theoretical performance.Some semi-supervised learning methods can be formulated with estimating functions. We analyze such types of methods in this poster. Amari and Kawanabe (1997) analyzed the information geometric properties of estimating functions. Using their framework, we derive the class of all estimating functions for our problem. We further derive the optimal estimating function. In general, however, the optimal estimating function is not available since it depends on a unknown quantity. We provide a way of constructing a good estimate, which is always available, for the optimal estimating function. In addition, we also mention that a specific class of semi-supervised approach is deeply related to a statistical paradox, which was geometrically analyzed by Henmi and Eguchi (2004).
Kei Kobayashi The Institute of Statistical Mathematics, JapanUsing algebraic method in information geometry
The second or higher order efficient estimators for curved exponential families have been studied in the context of information geometry. However, computation of the estimator and evaluation of the risk have not been discussed well other than sampling approximations as Monte Carlo methods. In this presentation, a class of ``algebraic'' second order efficient estimators for ``algebraic'' curved exponential families is proposed. For this class of estimators, differential forms such as the Fisher metric, affine connections and embedding curvatures can be computed by algebraic computational methods such as Gröbner bases. The efficient estimators, their bias and their risk are evaluated via these differential forms. We demonstrate the effectiveness of algebraic method using some simple example models.
Ryszard Kostecki University of Warsaw, PolandQuantum information geometry and non-commutative flow of weights
Using the non-commutative flow of weights on von Neumann algebras, we propose the definitions of the quantum information geometric notions of -divergence, riemannian metric and affine -connections on the spaces of finite positive normal functionals on von Neumann algebras. This way we generalise the formulation that was provided earlier by Jencova on the base of Araki-Masuda theory of non-commutative spaces. Using our formulation, we define the constrained maximum quantum relative -entropy updating rule and discuss some properties of quantum bayesian inference in this setting.
Michal Kupsa Academy of Sciences of the Czech Republic, Czech Republic (joint work with František Matúš)On colorings of bivariate random sequences
The ergodic sequences consisting of vectors , , over a finite alphabet are colored with colors for and colors for . Generic behavior of probabilities of monochromatic rectangles intersected with typical sets is examined. When n increases a big majority of pairs of colorings produces rectangles whose probabilities are bounded uniformly from above. Bounds are worked out in all regimes of the rates a and b of colorings. As a consequence, generic behavior of Shannon entropies of the partitions into rectangles is described.
Luigi Malagò Politecnico di Milano, Italy (joint work with Matteo Matteucci, Giovanni Pistone)Optimization of pseudo-Boolean functions based on the exponential family relaxation
Pseudo-Boolean functions are real-valued functions defined over a vector of binary variables. They appear in many different fields and are well studied in integer programming and in combinatorial optimization. The optimization of this class of functions is of particular interest, since it is NP-hard in the general formulation and no exact polynomial-time algorithm is available. Often in the literature pseudo-Boolean function optimization is referred as 0/1 programming.We analyze the problem of pseudo-Boolean functions optimization by introducing the notion of stochastic relaxation, i.e., we look for the minima of a pseudo-Boolean function by minimizing its expected value over a set of probability densities. By doing this, we move from a discrete optimization problem to a continuous one, where the parameters of the statistical model become the new variables of the optimization problem. We choose statistical models that belong to the exponential family, and we justify this choice with results about the characterization of its topological closure and of its tangent space. Indeed, we are looking for minimizing sequences of densities in the model that converge towards distributions with reduced support concentrated on the minima of the pseudo-Boolean function. Such limit distributions do not belong to the exponential model, so it becomes important, given an exponential model, to determine which densities are included in its closure. Similarly, we are interested in the characterization of the tangent space of an exponential family, since in each point we are looking for the direction of maximum decrement of the expected value of the original function. Under a proper choice of the sufficient statistics of the exponential family used in the relaxation, the curve of maximum decrement is an exponential family itself. We provide some results about the existence of critical points of the relaxed function, in terms of the relation between the expansion of the pseudo-Boolean function and the sufficient statistics of the exponential model. The presence of stationary points which correspond to saddle points may determine the existence of different local minima, to which a minimizing sequence of densities may converge.The analysis developed leads to the proposal of a new algorithm for pseudo-Boolean functions optimization based on stochastic gradient descent, for which we provide preliminary experimental results. The algorithm is in principle similar to some other techniques that have been proposed recently in the literature, often referred as population based algorithm, since at each iteration a pool of feasible candidate solutions is generated by sampling from a statistical model. Such algorithms are known in the Evolutionary Computation literature as Estimation of Distribution Algorithms (EDAs), and a similar approach appears also in stochastic optimization under the name of Cross-Entropy method. By taking inspiration from the EDA meta-heuristic and leveraging on the properties of the exponential model, we can design an algorithm that updates explicitly the model parameters in the direction of the gradient of the expected value of a pseudo-Boolean function, instead of estimating the value of the parameters of the model from a subset of the current sample of feasible solutions, as in most of the EDAs described in the literature. The gradient of the expected value of a function defined over the sample space, with respect to an exponential family, can be evaluated in terms of covariances, but since these evaluations require a summation over the entire search space, we propose to replace them with empirical covariances, to be estimated from the current sample. We implemented a vanilla version of the algorithm to find the ground states of some instances of a 2D spin glass model. The sufficient statistics of the exponential family have been determined according to the lattice structure of the spin glass model, such that all the monomials in the energy function correspond to a sufficient statistics of the model. We compared the performance of our algorithm with the state of the art algorithms in the Evolutionary Computation literature, to solve 2D Ising spin glass problems. We run multiple instances of the algorithms, for different sizes of the lattice, 8x8, 10x10, and 20x20, respectively.Preliminary experimental results are encouraging and compare favourably with other recent heuristics proposed in the literature. Since we deal with a sample size which is much small than the cardinality of the sample space, the estimation of the covariances is affected by large noise.For this reason it seems convenient to replace empirical covariance estimation with other techniques which proved to be able to provide more accurate estimation, such as shrinkage approach to large-scale covariance matrix estimation. Such methods offer robust estimation techniques with computational complexity which is often no more that twice that required for empirical covariance estimation.Moreover, the algorithm can also be applied in the black box optimization context, by incorporating in the estimation procedure some model building techniques able to learn from the sample a set of statistically significant correlations among the variables in the original function. Since often in real world problems we deal with sparse problems, i.e., each variable interact with a restricted number of variables, l1-regularized logistic regression methods for high-dimensional model selection techniques seem to provide valuable tools in this context. The algorithm we proposed is highly parallelizable, both in the estimation of covariances and in the sampling step. The final aim is to develop an efficient and effective approach to adaptively solve very large pseudo-Boolean problems also in the black-box context for which the interaction structure among the variable is unknown.
Keiji Matsumoto National Institute of Informatics, JapanMonotone "metric" in the channel space: decision theoretic approach
The aim of the talk is to characterize monotone `metric' in the space of markov map. Here, `metric' means the square of the norm defined on the tangent space, and not necessarily equals the inner product of the vector with itself, different from usual notion of metric used in differential geometry. (Hereafter, this property, that the norm is induced from an inner innerproduct, is called inner-product-assumption.) So far, there have been plenty of literatures on the metric in the space of probability distributions and quantum states. Cencov, sometime in 1970s, proved the monotone metric in probability distribution space is unique up to constant multiple, and identical to Fisher information metric Cencov. Amari and others independently worked on the same object, especially from differential geometrical view points, and applied to number of problems in information sciences. Quantum mechanical states are discussed by Nagaoka, Fujiwara, Matsumoto and Petz. Among them, Petz characterized all the monotone metrics in the quantum state space using operator mean theory.As for channels, however, only a little is known about its geometrical structures. To my knowledge, there had been no study about axiomatic characterization of distance measures in the classical or quantum channel space. First, we show the upper and the lower bound of monotone channel "metric", and it is proved that any monotone "metric" cannot satisfy the inner-product-assumption. We give counter examples in the space of binary channels. The proof utilizes "local" version of Blackwell's randomization criteria for equivalence of statistical models, which is well known in statistical decision theory.The latter result has some impact on the axiomization of the monotone metric in the space of classical and quantum states, since both Cencov and Petz rely on the inner-product-assumption. Since classical and quantum states can be viewed as channels with the constant output, it is preferable to dispense with the inner-product-assumption. Recalling that the Fisher information is useful in asymptotic theory, it would be natural to introduce some assumptions on asymptotic behaviour. Hence, we introduced weak asymptotic additivity and lower asymptotic continuity. By these additional assumptions, we not only recovers uniqueness result of Cencov, but also proves uniqueness of the monotone `metric' in the channel space. It is known that this unique "metric" gives maximum information in estimating unknown channels. In this proof, again, we used the local and asymptotic version of randomization criteria. In the end, there is an implication on quantum state metrics. A quantum state can be viewed as a classical channel which takes a measurement as an input, and outputs measurement a result. If we restrict the measurement to separable measurement, the asymptotic theory discussed in our paper can be applied to quantum states also, proving the uniqueness of the metric. On the other hand, the author's past manuscript had reestablished the upper and the lower bound of the monotone metric by Petz, without relying on the inner-product-assumption. This suggests the monotone `metric' in the quantum state space is not unique. Therefore, having collective measurement is essential to have a variety of monotone metrics.
Guido Montùfar Max Planck Institute for Mathematics in the Sciences, GermanyFaces of the probability simplex contained in the closure of an exponential family and minimal mixture representations
This work is about subsets of a state-space with the following property (S): All probability distributions supported therein are elements of the closure of a given exponential family. There exists an optimal cardinality condition for sets of the state-space which ensures they have the property S. However, this is not a necessary condition, and the sets can be considerably larger. We present a characterization of S and use it to compute lower bounds on the maximal cardinality of sets with S. Furthermore we show that there are actions on the state-space which preserve the property S. These results are applied to find bounds on the minimal number of elements belonging to a certain exponential family forming a mixture representation of a probability distribution which belongs to another (larger) exponential family.
Mariela Portesi CONICET & Universidad Nacional de La Plata, Argentina (joint work with Fernando Montani)Statistical modeling of neuronal activity for an infinitely large number of neurons
An important open question in mathematical neuroscience is how to evaluate the significance of high order spike correlations in the neural code through analytically solvable models. We investigate the thermodynamic limit of a widespread probability distribution of firing in a neuronal pool, within the information geometry framework, considering all possible contributions from high order correlations. This allows us to identify a deformation parameter accounting for the different regimes of firing within the probability distribution, and to investigate whether those regimes could saturate or increase information as the number of neurons goes to infinity.S. Amari, H. Nakahara, S. Wu and Y. Sakai, Neural Comput. 15, 127 (2003)B.B. Averbeck, P.E. Latham and A. Pouget, Nat. Rev. Neurosci. 7, 358 (2006)F. Montani, A. Kohn, A. Smith and S.R. Schultz, J. Neurosci. 27, 2338 (2007)F. Montani, R.A.A. Ince, R. Senatore, E. Arabzadeh, M.E. Diamond and S. Panzeri, Phil. Trans. R. Soc. 367, 3297 (2009)
Johannes Rauh Max Planck Institute for Mathematics in the Sciences, GermanySupport sets in exponential families and oriented matroid theory
My poster presents results from a joint work with Nihat Ay and Thomas Kahle (preprint available at arXiv:0906.5462). We study how results from algebraic statistics generalize to the case of non-algebraic exponential families on finite sets. Here an exponential family is called algebraic if it has an integer-valued matrix of sufficient statistics. In this case the exponential family is the intersection of an algebraic variety with the probability simplex, which makes available the powerful tools of computational commutative algebra. While most relevant examples of exponential families are algebraic, it turns out that ignoring the algebraic properties yields another viewpoint which better captures the continuity aspects of exponential families.A lot of properties can be deduced from an oriented matroid naturally associated to the exponential family: The closure of a discrete exponential family is described by a finite set of equations corresponding to the circuits of this matroid. These equations are similar to the equations used in algebraic statistics, although they need not be polynomial in the general case. This description allows for a combinatorial study of the possible support sets in the closure of an exponential family. In particular, if two exponential families induce the same oriented matroid, then their closures have the same support sets.Finally we find a surjective (but not injective) parametrization of the closure of the exponential family by adding more parameters. These parameters also have an interpretation in the matroid picture. The parametrization generalizes the monomial parametrization used in algebraic statistics in the case of algebraic exponential families.
Shigeru Shuto Osaka University, JapanInformation geometry of renormalization on diamond fractal Ising spins
The renormalization group procedure to calculate a partition function in statistical mechanics, is considered as the successive approximation to the canonical distribution on its state space. By embedding renormalized states into the original state space, this approximation is characterized as the m-projection from the canonical distribution onto a renormalization submanifold. We apply this method for a diamond fractal Ising spin model, whose renormalization flow has fixed points in the finite region.
Takashi Takenouchi Nara Institute of Science and Technology, JapanBayesian decoder for multi-class classification by mixture of divergence
Multi-class classification problem is one of the major topic in the fields of machine learning. There are many works on the topic, and one major approach considers a decomposition of the multi-class problem into multiple binary classification problems based on the framework of error correcting output coding (ECOC). Each decomposed binary problem is independently solved and results of binary classifiers are integrated (decoded) for a prediction of multi-class label.In this research, we present a new integration method of binary classifiers for multi-class problem. Our integration method (decoder) is characterized by a minimization of sum of divergences, in which each divergence measures diversity between the decoder and a posterior distribution of the class label associated with a binary classifier.We investigate performance of the proposed method using a synthetic dataset, datasets from the UCI repository.
Tatsuaki Wada Ibaraki University, Japan (joint work with Atsumi Ohara)Legendre duality and dually-flat structure in nonextensive thermostatistics developed by S2-q formalism
S-formalism [1] in the generalized thermostatistics based on Tsallis entropy S [2] is a natural formalism in the sense that the associated Legendre structures are derived in a similar way as in the standard thermostatistics. From a q-exponential probability distribution function (pdf), which maximizes S under the constraint of linear average energy U, the so-called escort pdf is naturally appeared in this formalism. The generalized Massieu potential associated with S and U is related to the one associated with the normalized Tsallis entropy S and the normalized q-average energy U, which is the energy-average w.r.t. the escort pdf. The S formalism has also provided the connections among some different versions of Tsallis nonextensive thermostatistics, a non self-referential expression of Tsallis’ pdf [3], and the relation between the Boltzmann temperature and the Lagrange multiplier in nonextensive thermostatistics [4].On the other hand, it is shown recently in Ref. [5] that a dually flat structure on the space of the escort probabilities is obtained by applying 1-conformal transformation to the -geometry, which is an information geometry with a constant curvature and related with Tsallis relative entropy [6].We explore the relation between the information geometrical structures associated with this dually flat structure and the Legendre structures in the S formalism. We show that the Legendre dual potential functions in the information geometry with this dually flat structure are the generalized Massieu potential and . We further study the correspondences among the potential functions, dual affine coordinates, and relevant divergence functions between the information geometry of the dually flat structure and S-formalism.[1] T. Wada, A.M. Scarfone, Connections between Tsallis' formalisms employing the standard linear average energy and ones employing the normalized q-average energy, Phys. Lett. A 335 (2005) 351-362.[2] C. Tsallis, Introduction to Nonextensive Statistical Mechanics - Approaching a Complex World (Springer,. New York, 2009).[3] T. Wada, A.M. Scarfone, A non self-referential expression of Tsallis' probability distribution function, Eur. Phys. J. B 47 (2005) 557-561.[4] T. Wada, A.M. Scarfone, The Boltzmann Temperature and Lagrange Multiplier in Nonextensive Thermostatistics, Prog. Theor. Phys. Suppl. 162 (2006) 37-44.[5] A. Ohara, H. Matsuzoe, S-I. Amari, A dually flat structure on the space of escort distributions, J. Phys.: Conf. Series 201 (2010) 012012.[6] A. Ohara, Geometric study for the Legendre duality of generalized entropies and its application to the porous medium equation, Eur. Phys. J. B 70, (2009) 15-28.
Yu Watanabe The University of Tokyo, Japan (joint work with Takahiro Sagawa, Masahito Ueda)Optimal measurement and maximum fisher information on noisy quantum systems
The most serious obstacle against realizing quantum computers and networks is decoherence that acts as a noise and causes information loss. Decoherence occurs when a quantum system interacts with its environment, and it is unavoidable in almost all quantum systems. Therefore, one of the central problems in quantum information science concerns the optimal measurement to retrieve information about the original quantum state from the decohered one and the maximum information that can be obtained from the measurement.We identify an optimal quantum measurement that retrieves the maximum Fisher information about the expectation value of an observable from the partially decohered state. And we also clarify the maximum Fisher information obtained by the optimal measurement.[1] Y. Watanabe, T. Sagawa, and M. Ueda, Phys. Rev. Lett. 104, 020401 (2010).
Le Yang CNRS, FranceRiemannian median and its estimation
In order to characterize statistical data lying on a Riemannian manifold, one often uses the barycenter of empirical data as the notion of centrality. But it is known to all that barycenter is not a robust estimator and is sensitive to outliers. An ideal substitute of the barycenter possessing robustness is the notion of geometric median. In this paper, we define geometric median for a probability measure on a Riemannian manifold, give its characterization and a natural condition to ensure its uniqueness. In order to compute geometric median in practical cases, we also propose a subgradient algorithm and prove its convergence as well as estimating the error of approximation and the rate of convergence. The convergence property of this subgradient algorithm, which is a generalization of the classical Weiszfeld algorithm in Euclidean spaces to the context of Riemannian manifolds, does not depend on the sign of curvatures of the manifold and hence improves a recent result of Fletcher and his colleagues.
The Chernoff distance represents a symmetrized version of the Kullback-Leibler distance/relative entropy between two probability distributions. Since it does not satisfy the triangle inequality it does not define a distance measure on the probability simplex in a strictly mathematical meaning. On the other hand it is an important measure of distinguishability among probability distributions. In particular, it is known to provide a sharp bound on exponential error rates in binary simple hypothesis testing. In the context of multiple hypothesis testing the corresponding optimal error exponent has been identified by Salikhov as the minimum of Chernoff distances over the different pairs of distributions from the finite set considered. This minimum is refered to as generalized Chernoff distance. We want to present results of our earlier and recent work - see the list of references below - which settle Chernoff type bounds in the context of quantum hypothesis testing, where the hypotheses are represented by density operators associated to states of a finite quantum system. Further, we intend to address some naturally arising information-geometric questions concerning the quantum version of Hellinger arc, and more general, the structure of exponential families in state spaces of non-commutative algebras of observables. M. Nussbaum, A. Szkoła, "The Chernoff Lower Bound for Symmetric Quantum Hypothesis Testing", The Annals of Statistics Vol. 37, No. 2, 1040-1057 (2009)K. M. R. Audenaert, M. Nussbaum, A. Szkoła, and F. Verstraete, "Asymptotic Error Rates in Quantum Hypothesis Testing", Commun. Math. Phys. Vol. 279, No. 1, 251-283 (2008),springerlink.metapress.com/content/e1463367803n505g/fulltext.pdfM. Nussbaum, A. Szkoła, "Asymptotic optimal discrimination between pure quantum states", to appear in TQC Proceedings (2010), MPI MiS preprint 1/2010M. Nussbaum, A. Szkoła, "Exponential error rates in multiple state discrimination on a quantum spin chain", submitted to Commun. Math. Phys. (2010), MPI MiS preprint 3/2010, xxx.lanl.gov/abs/1001.2651
State-of-the-art methods for identifying and modeling the motifs mediating protein-DNA binding use Position Weight Matrices as a model for the motif, implicitely assuming the statistical independence of the nucleotides within the motif. Dropping this hypothesis is a two-fold challenge: the structure of these dependencies has to be inferred from the available data, whose limited amount also imposes a parsimony requirement on the model. After an introduction to this field of bioinformatics, the statistical underpinnings of a solution proposed jointly with Ivo Grosse (University of Halle) will be presented.
The relative entropy distance of a state from an exponential family is important in information theory and statistics. Two-dimensional examples in the algebra of complex $3 \times 3$-matrices reveal that the mean value set of an exponential family has typically non-exposed faces. The Staffelberg family stands out of these examples due to a discontinuous entropy distance. We meet these phenomena e.g. in optimization problems on the state space (including singular states). They do not occur in the probabilistic case of an abelian $^{\ast}$-subalgebra of complex $N \times N$-matrices. Analogues of probability theory exist in the non-abelian quantum case, though: The entropy distance from an exponential family in a finite-dimensional ${C^{\ast}}$-algebra is given by a projection to an extension of the family. An optimal form of the Pythagorean theorem of relative entropy holds for this extension. We conclude with applications.Part of this work is jointly with Andreas Knauf.
The Bradley-Terry (BT) model is a basic probability model for item ranking or user preference from paired comparison data. For example, it can be used for estimating intrinsic strength of football teams based on results in the league. Also it can be applied for solving multi-class discriminant problem with binary classifiers.
In the BT model, each item (or user) is given a positive value, and comparison result of two items is modeled by a Bernoulli distribution parametrized by the ratio of values given to these two items. So far, estimation methods of this model have been discussed from several contexts, and most of them are based on the maximum likelihood method, i.e. minimizing the sum of weighted Kullback-Leibler (KL) divergences between Bernoulli distributions and paired comparison data.
We focus on the following two important facts: 1) a set of normalized values given to items (sum up to 1) can be identified with a categorical distribution, 2) observations, i.e. paired comparison data, can be regarded as incomplete data from categorical distributions, and each observation constructs an m-flat submanifold in the space of categorical distributions represented as a probability simplex. Based on these notions, we construct an objective function with the sum of KL divergences between a categorical distribution and a submanifold, and derive an em-like algorithm, which is an iterative estimation method with e-projections and m-projections. Moreover, by considering the geometrical relationship between empirical influence functions and m-flat submanifolds, we can introduce a natural estimation method of confidence in each observation. We will demonstrate effectiveness and advantages of our proposal with synthetic and real-world data.
This work was done in collaboration with my colleague, Yu Fujimoto and Hideitsu Hino.
This is an expository talk on the role of dual connection in mirror symmetry, which is known to many people from the begining of this century. Since dual connection plays an important role in information geometry it may make sense to explain this point in this conference.
Policy search is a successful approach to reinforcement learning which has yielded many interesting applications in a variety of areas.
Unlike traditional value function-based learning methods, policy search approaches can be guaranteed to converge at least to a local optimum, can handle partially observed variables and allow straightforward integration of domain knowledge. Research on policy search started with the classical work on parametric policy gradient methods. These turned out to be surprisingly slow and marred by bad trade-offs between exploration and exploitation parameters in the policy.
Results from information theory for supervised and unsupervised learning have triggered research into natural policy gradient methods. These turned out to be significantly more robust and efficient than the previous vanilla policy gradient approaches. An interesting interpretation of these results was that the policy improvements can fix the amount of loss of information while maximizing the reward. This loss of information is measured with the relative entropy between the experienced state-action distribution and the new one generated by the improved policy. We continue this path of reasoning and suggest the Relative Entropy Policy Search (REPS) method. The resulting method differs significantly from previous policy gradient approaches and yields an exact update step.
We show applications of these methods in the improvement of behaviors for robots and discuss its wider implications.
Euler-Schouten embedding curvature (the second fundamental form) plays an important role in not only geometry itself but also computational mathematics or scientific computing.
A well-known example would be a relation with performance of estimators in statistical inference, which was elucidated by the Amari's seminal work.
In this talk, we show that iteration-complexity of an interior-point algorithm for conic linear programming problems (e.g., linear or semidefinite programming and so on) is characterized by dual embedding curvature of a feasible region, or specifically, what is called a central trajectory.
In an extreme case where the curvature vanishes, we can construct a formula for an optimal solution, and hence, need no iterations to solve it. The related topics will also be presented.
The talk is partly based on a joint work with Takashi Tsuchiya at ISM Japan.
Statistical models belonging to a generalized exponential family automatically satisfy the variational principle, which is a stronger statement than the maximum entropy principle. It implies a dual structure which in thermodynamics is known since long in the form of Legendre transforms and their inverses. The entropy function in the variational principle is essentially unique. For convenience, let us call it the stable entropy function. For the special case of the $q$-exponential family this is Tsallis' entropy function with parameter $2-q$.Statistical models belonging to the $q$-exponential family occur frequently in statistical physics. In particular, the configurational probability distribution of any model of classical mechanics, when considered as a function of the total energy, belongs to the $q$-exponential family, with a parameter $q$ which tends to 1 when the number of degrees of freedom tends to infinity. It is well-known from the canonical case of the Boltzmann-Gibbs distribution that the variational principle implies a property of stability, which roughly means that phase transitions cannot occur in systems with a finite number of degrees of freedom. This stability holds also for models belonging to a generalized exponential family.If the stable entropy function is replaced by an increasing function of itself then the maximum entropy principle is still satisfied, but the variational principle is violated. In particular, if Tsallis' entropy is replaced by that of Rényi, then the stability property gets lost. In physical models the lack of stability means that phase transitions may occur. We show that, when Rényi's entropy function is used, the simple model of the pendulum exhibits a first order phase transition between small angle librational motion at low values of the energy and full rotational motion at high energies.
One of the most flexible ways of representing families of discrete probability distributions is through event trees and the chain event graph. These are also extremely useful for representing causal hypotheses. In this talk I will show how the algebra associated with event tree families is less constrained and therefore more expressive than its Bayesian Network competitor. I will illustrate how an understanding of the algebraic features of these models gives insight into their underlying structure.
Nihat Ay proposed the following problem [1], motivated from statistical learning theory: Let $\mathcal{E}$ be an exponential family. Find the maximizer of the Kullback-Leibler distance $D(P\|\mathcal{E})$ from $\mathcal{E}$. A maximizing probability measure $P$ has a lot of interesting properties. For example, the restriction of $\hat{P}$ to the support of $P$ will be equal to $P$, i.e. $\hat{P}(x) = P(x)\hat{P}(Z)$ if $x\in\ensuremath{\mathrm{supp}}(P)$ (for the proof in the most general case see [2]). This simple property can be used to transform the problem into another form. The first observation is that probability measures having this "projection property" always come in pairs $P_{1},P_{2}$, such that $P_{1}$ and $P_{2}$ have the same sufficient statistics $A$ and disjoint supports. Therefore we can solve the original problem by investigating the kernel of the sufficient statistics $\ker A$. If we find all local maximizers of \begin{equation*}\overline D(M) := \sum_{x} M(x) \log |M(x)|, \quad M\in\ker A,\end{equation*} subject to $\|M\|_{\ell_{1}} \le 2$, then we know all maximizers of the original problem. The talk will present the transformed problem and its relation to the original problem. In the end I will give some consequences for the solutions of the original problem.[1] N. Ay: An Information-Geometric Approach to a Theory of Pragmatic Structuring. The Annals of Probability 30 (2002) 416-436. [2] F. Matúš: Optimality conditions for maximizers of the information divergence from an exponential family. Kybernetika 43, 731-746.
A quantum channel is a completely positive map that represents a dynamical change of a quantum system. As a natural generalization of the geometry of Berry-Uhlmann's phase, a principal fibre bundle over a manifold of quantum channels is introduced, in which the redundancy of the operator-sum representation is regarded as a fibre. It is demonstrated that this geometry plays an essential role in quantum channel estimation theory that seeks an optimal estimation scheme for an unknown quantum channel.
References: [1] A. Fujiwara, "Quantum channel identification problem", Phys. Rev. A, vol. 63, 042304 (2001). [2] A. Fujiwara and H. Imal, "A fibre bundle over manifolds of quantum channels and its application to quantum statistics", J. Phys. A: Math. Theor., vol. 41, 255304 (2008).
This study elaborates some examples of a simple evolutionary stochastic rate process where the population rate of change depends on the distribution of properties --- so different cohorts change at different rates. We investigate the effect on the evolution arising from parametrized perturbations of uniformity for the initial inhomogeneity. The information geometric neighbourhood system yields also solutions for a wide range of other initial inhomogeneity distributions, including approximations to truncated Gaussians of arbitrarily small variance and distributions with pronounced extreme values. It is found that, under quite considerable alterations in the shape and variance of the initial distribution of inhomogeneity in unfitness, the decline of the mean does change markedly with the variation in starting conditions, but the net population evolution seems surprisingly stable.
Keywords: Evolution, inhomogeneous rate process, information geometry, entropy, uniform distribution, log-gamma distribution.
Let ${\cal F}_{op}$=\{symmetric, normalized, operator monotone functions\}. If we set \[ {\cal F}^{\, r}_{op}:=\{f \in {\cal F}_{op} | f(0)>0 \}, \qquad \qquad {\cal F}^{\, n}_{op}:=\{f \in {\cal F}_{op} | f(0)=0 \}, \] trivially it holds $$ {\cal F}_{op}={\cal F}^{\, r}_{op} \, \dot\cup \, {\cal F}^{\, n}_{op}. $$ Define \[ \tilde{f}(x):=\frac{1}{2}\left[ (x+1)-(x-1)^2 \frac{f(0)}{f(x)} \right]\qquad x>0. \] It is possible to prove that the map $f \to \tilde f,$ is a bijection from ${\cal F}^{\, r}_{op}$ to ${\cal F}^{\, n}_{op}$, namely a bijection between regular and non-regular functions.In the last years a number of consequences has been derived from this fact: 1) the dynamical uncertainty principle; 2) its generalization to von Neumann algebras; 3) a new proof of the fact that the Wigner-Yanase-Dyson is an example of a quantum Fisher information; 4) a new proof the monotonicity property for the WYD information; 5) a link between quantum relative entropy and metric adjusted skew information.The purpose of my talk is to describe the above applications. Andai, A., Uncertainty principle with quantum Fisher information, J. Math. Phys. 49 (2008), 012106. Audenaert, K., Cai, L. and Hansen, F., Inequalities for quantum skew information, Lett. Math. Phys., 85: 135--146, 2008. Gibilisco, P., Hansen, F. and Isola T., On a correspondence between regular and non-regular operator monotone functions. Linear Algebra Appl., 430: 2225-2232, 2009. Gibilisco, P., Hiai F. and Petz, D., Quantum covariance, quantum Fisher information and the uncertainty relations. IEEE Trans. Inform. Theory, 55: 439-443, 2009. Gibilisco, P. and Isola, T., A dynamical uncertainty principle in von Neumann algebras by operator monotone functions. J. Stat. Phys., 132: 937--944, 2008. Luo, S., Quantum Fisher information and uncertainty relations. Lett. Math. Phys. 53: 243--251, 2000. Petz, D. and Szabó, V. E. S., From quasi-entropy to skew information. International J. Math., 20:1421--1430, 2009.
In case of a finite state space, we can analyze statistical exponential families with tools from both Differential Geometry and Commutative Algebra, see e.g. Gibilisco, Riccomagno, Rogantin, Wynn eds (2009). A key feature of the algebraic framework is the finite generation of polinomial ideals. In the general case, the reduction to some sort of finite generation is more difficult, but it is a classical ingredient of standard statistical models. We suggested in Pistone and Wynn Statistica Sinica (1999) a definition of finite generation which is a generalization of the Morris class of distribution, see Morris Annals of Statistics (1982, 1983). We present here a development of this theory which makes use of recent results on Gröbner bases for D-modules. This is a joint work with Henry Wynn, LSE London UK.