Uncertainty and Stochasticity of Optimal Policies
Guido Montúfar, Johannes Rauh, and Nihat Ay
Contact the author: Please use for correspondence this email.
Submission date: 09. Mar. 2021
Keywords and phrases: POMDP, stochastic policy
Download full preprint: PDF (1455 kB)
We are interested in action selection mechanisms, policies, that maximize an expected long term reward. In general, the identity of an optimal policy will depend on the specifics of the problem, including perception and memory limitations of the agent, the system’s dynamics, and the reward signal. We discuss results that allow us to use partial descriptions of the observations, state transitions, and reward signal, in order to localize optimal policies to within a subset of all possible policies. These results imply that we can reduce the search space for optimal policies, for all problems that share the same general properties. Moreover, in certain cases of interest, we can identify the policies that produce the same behaviors and the same expected long term rewards, thereby further reducing the search space.