Computing the posterior expectation of phylogenetic trees
Philipp Benner and Miroslav Bačák
Contact the author: Please use for correspondence this email.
Submission date: 16. May. 2013 (revised version: December 2013)
Download full preprint: PDF (484 kB)
Inferring phylogenetic trees from multiple sequence alignments often relies upon Markov chain Monte Carlo (MCMC) methods to generate tree samples from a posterior distribution. To give a rigorous approximation of the posterior expectation, one needs to compute the mean of the tree samples and therefore a sound definition of a mean and algorithms for its computation are required. To the best of our knowledge, no existing method of phylogenetic inference can handle the full set of tree samples, because such trees typically have different topologies. We develop a statistical model for the inference of phylogenetic trees based on the tree space due to Billera et al. . Since it is an Hadamard space, the mean and median are well defined, which we also motivate from a decision theoretic perspective. The actual approximation of the posterior expectation relies on some recent developments in Hadamard spaces (Bacak [2013a], Miller et al. ) and the fast computation of geodesics in tree space (Owen and Provan ), which altogether enable to compute medians and means of trees with different topologies. We demonstrate our model on a small sequence alignment. The posterior expectations obtained on this data set are a meaningful summary of the posterior distribution and the uncertainty about the tree topology.