The origin of wald space and phylogenetic information geometry

  • Tom Nye (Newcastle University)
E1 05 (Leibniz-Saal)


Most existing metrics between phylogenetic trees directly measure differences in topology and edge weights, and are unrelated to the models of evolution used to infer trees. The wald space is a newly developed geodesic metric space which arose from a shift of viewpoint, in which trees are identified with the probability models on genetic sequence data they induce. We describe a family of metrics between trees which are pull-backs of metrics between probability distributions of discrete characters induced by trees. These behave very differently from existing metrics and we illustrate this using some simple examples. In order to do statistics on data sets of trees, it is highly desirable to construct metric spaces which are lengths spaces, or even better, geodesic spaces, and we describe how to construct a length space using probabilistic metrics. Locally, this space consists of a Riemannian manifold for each collection of trees with a fixed branching topology, in which the Riemannian metric is the Fisher information metric. Calculations in this space are highly computationally intensive due to the use of discrete characters. By developing a related Gaussian Markov process model for a continuous trait, we are able to identify trees with certain Gaussian distributions and this enables much faster computation. The same geometry is obtained by embedding trees in the space of positive definite matrices with the affine invariant metric, by considering covariance matrices of Gaussians. This embedded space is the wald space.

Katharina Matschke

Max Planck Institute for Mathematics in the Sciences, Leipzig Contact via Mail

Karen Habermann

University of Warwick

Sayan Mukherjee

Max Planck Institute for Mathematics in the Sciences, Leipzig

Max von Renesse

Leipzig University

Stefan Horst Sommer

University of Copenhagen