Workshop
From Markov decision processes to algebraic statistics
- Johannes Müller (Max Planck Institute for Mathematics in the Sciences)
Abstract
We study the reward maximization problem in partially observable Markov decision processes with stationary stochastic policies, which constitute an important model in sequential decision making under uncertainty. We obtain a description of the problem as the optimization of a linear objective over the probability simplex subject to polynomial constraints that we characterize explicitly. This allows us to describe the combinatorial and algebraic complexity of the problem and we obtain bounds on its algebraic degree. Further, we conduct experiments using methods from numerical algebra to solve the critical equations and compare the results to standard gradient based optimization.