Preprint 9/2021

Task-agnostic constraining in average reward POMDPs

Guido Montúfar, Johannes Rauh, and Nihat Ay

Contact the author: Please use for correspondence this email.
Submission date: 09. Mar. 2021
Pages: 8
Keywords and phrases: partial observability, Markov decision process, stochastic policy, memoryless policy, optimal planning
Download full preprint: PDF (2106 kB)

We study the shape of the average reward as a function over the memoryless stochastic policies in infinite-horizon partially observed Markov decision processes. We show that for any given instantaneous reward function on state-action pairs, there is an optimal policy that satisfies a series of constraints expressed solely in terms of the observation model. Our analysis extends and improves previous descriptions for discounted rewards or which covered only special cases.

11.03.2021, 02:18