Minimum-Information Planning in Partially-Observable Decision Problems
- Roy Fox (School of Computer Science and Engineering, Hebrew University, Israel)
Intelligent agents, interacting with their environment, operate under constraints on what they can observe and how they can act. Unbounded agents can use standard reinforcement learning techniques to optimize their inference and control under purely external constraints. Bounded agents, on the other hand, are subject to internal constraints as well. This only allows them to partially attend to their observations, and to partially intend their actions, requiring boundedly-rational selection of perception and action.
This problem is particularly interesting when restricted to memoryless agents. Their optimization is a sequential rate-distortion problem of trading off internal communication costs with external costs. The solution exhibits intriguing and useful phenomenology, such as phase transitions which cluster observations into actions in discrete domains, or reduce the controller order in continuous domains. Moreover, the general problem with retentive (memory-utilizing) agents can be reduced to the memoryless case by considering communication costs on the memory channel.