General Infomax Agents through World Models

Danijar Hafner (Google Brain & University of Toronto)

Live Stream

Abstract

Deep reinforcement learning has enabled machines to solve complex control problems directly from high-dimensional camera inputs. However, these systems rely on carefully designed reward functions for specific tasks. On the other hand, humans learn about the world and perform complex behaviors without any external reward signal. We categorize the space of possible objective functions for embodied agents. We show a spectrum that reaches from narrow to general objectives. While the narrow objectives correspond to domain-specific rewards as typically used in reinforcement learning today, the general objectives correspond to information maximization through world models. This explains unsupervised learning, perception, exploration, skill discovery, and control from a single principle. Our findings suggest designing powerful world models as a path toward building highly adaptive agents that seek out large niches in their environments, rendering task rewards optional.

Links

seminar

03.07.25 02.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

Details anzeigen

Upcoming Events of this Seminar

Donnerstag, 03.07.25 On the Power of Context-Enhanced Learning in LLMs with Xingyu Zhu
Donnerstag, 10.07.25 The effect of low rank and stochasticity on Gradient Descent at the Edge of Stability with Avrajit Ghosh a.o.
Donnerstag, 14.08.25 to be announced with Jonathan Siegel
Donnerstag, 02.10.25 to be announced with Marcello Carioni