Task-agnostic constraining in average reward POMDPs
Guido Montúfar, Johannes Rauh, and Nihat Ay
Contact the author: Please use for correspondence this email.
Submission date: 09. Mar. 2021
Keywords and phrases: partial observability, Markov decision process, stochastic policy, memoryless policy, optimal planning
Download full preprint: PDF (2106 kB)
We study the shape of the average reward as a function over the memoryless stochastic policies in infinite-horizon partially observed Markov decision processes. We show that for any given instantaneous reward function on state-action pairs, there is an optimal policy that satisfies a series of constraints expressed solely in terms of the observation model. Our analysis extends and improves previous descriptions for discounted rewards or which covered only special cases.