Delve into the future of research at MiS with our preprint repository. Our scientists are making groundbreaking discoveries and sharing their latest findings before they are published. Explore repository to stay up-to-date on the newest developments and breakthroughs.
Computing the posterior expectation of phylogenetic trees
Philipp Benner and Miroslav Bačák
Inferring phylogenetic trees from multiple sequence alignments often relies upon Markov chain Monte Carlo (MCMC) methods to generate tree samples from a posterior distribution. To give a rigorous approximation of the posterior expectation, one needs to compute the mean of the tree samples and therefore a sound definition of a mean and algorithms for its computation are required. To the best of our knowledge, no existing method of phylogenetic inference can handle the full set of tree samples, because such trees typically have different topologies. We develop a statistical model for the inference of phylogenetic trees based on the tree space due to Billera et al. . Since it is an Hadamard space, the mean and median are well defined, which we also motivate from a decision theoretic perspective. The actual approximation of the posterior expectation relies on some recent developments in Hadamard spaces (Bacak [2013a], Miller et al. ) and the fast computation of geodesics in tree space (Owen and Provan ), which altogether enable to compute medians and means of trees with different topologies. We demonstrate our model on a small sequence alignment. The posterior expectations obtained on this data set are a meaningful summary of the posterior distribution and the uncertainty about the tree topology.