Interactive learning

  • Susanne Still (University of Hawaii, USA)
G3 10 (Lecture hall)


I present a quantitative approach to interactive learning and adaptive behavior which integrates model- and decision-making into one theoretical framework. This approach follows simple principles by requiring that the observer’s behavior and the observer’s internal representation of the world should result in maximal predictive power at minimal complexity. Classes of optimal action policies and of optimal models can be derived from an objective function that reflects this trade-off between prediction and complexity. The resulting optimal models then summarize, at different levels of abstraction, the process’s causal organization in the presence of the learner’s actions. A fundamental consequence of the proposed principle is that the learner’s optimal action policies have the emerging property that they balance exploration and control. Interestingly, the explorative component is present in the absence of policy randomness, i.e. in the optimal deterministic behavior. This is a direct result of requiring maximal predictive power in the presence of feedback. Exploration is therefore not the same as policy randomization. This stands in contrast to, for example, Boltzmann exploration which is used in Reinforcement Learning (RL). Time permitting, I will discuss what happens when one includes rewards, such as is popular in RL.

Antje Vandenberg

Max-Planck-Institut für Mathematik in den Naturwissenschaften Contact via Mail

Nihat Ay

Max Planck Institute for Mathematics in the Sciences, Leipzig

Ralf Der

Max Planck Institute for Mathematics in the Sciences, Leipzig

Mikhail Prokopenko

CSIRO, Sydney