General Infomax Agents through World Models
- Danijar Hafner (Google Brain & University of Toronto)
Deep reinforcement learning has enabled machines to solve complex control problems directly from high-dimensional camera inputs. However, these systems rely on carefully designed reward functions for specific tasks. On the other hand, humans learn about the world and perform complex behaviors without any external reward signal. We categorize the space of possible objective functions for embodied agents. We show a spectrum that reaches from narrow to general objectives. While the narrow objectives correspond to domain-specific rewards as typically used in reinforcement learning today, the general objectives correspond to information maximization through world models. This explains unsupervised learning, perception, exploration, skill discovery, and control from a single principle. Our findings suggest designing powerful world models as a path toward building highly adaptive agents that seek out large niches in their environments, rendering task rewards optional.