Search

Workshop

From Markov decision processes to algebraic statistics

  • Johannes Müller (Max Planck Institute for Mathematics in the Sciences, Germany)
E1 05 (Leibniz-Saal)

Abstract

We study the reward maximization problem in partially observable Markov decision processes with stationary stochastic policies, which constitute an important model in sequential decision making under uncertainty. We obtain a description of the problem as the optimization of a linear objective over the probability simplex subject to polynomial constraints that we characterize explicitly. This allows us to describe the combinatorial and algebraic complexity of the problem and we obtain bounds on its algebraic degree. Further, we conduct experiments using methods from numerical algebra to solve the critical equations and compare the results to standard gradient based optimization.

Katharina Matschke

Max Planck Institute for Mathematics in the Sciences Contact via Mail

Guido Montúfar

Max Planck Institute for Mathematics in the Sciences