Search

Talk

Interactive learning and optimal predictions

  • Susanne Still (Information and Computer Sciences, University of Hawai'i, Mānoa, Honolulu, USA)
A3 02 (Seminar room)

Abstract

The principles of statistical mechanics and information theory play an important role in learning theory. I start by asking a simple question: given a time series, what is the class of models of the past data that are maximally predictive at a fixed model complexity? Predictiveness is measured by the information captured about future data, while complexity is measured by the coding rate. As a family of solutions, one finds Gibbs distributions, in which the trade-off parameter between complexity and predictive power plays the role of a temperature. I show that, in the low temperature regime, the resulting algorithm retrieves sufficient statistics by finding the causal state partition of the past. This algorithm is essentially a Blahut-Arimoto algorithm, and the above problem can be mapped onto rate--distortion theory and the "information bottleneck" method. I show in examples that by studying the resulting rate--distortion curve, one can learn something about the underlying d ata generating process's "causal compressibility". The rate distortion curve can be computed analytically for some processes, which act as extreme cases: periodic processes and i.i.d. processes. Time permitting, I will discuss issues of complexity control that arise due to sampling errors because of finite data sets.

Agents, including robots and animals, change their environment. Therefore, the data that they observe is to varying degrees a consequence of their own actions. The above lays the ground work for studying "interactive learning", a paradigm that asks for optimal sampling strategies in the presence of feedback from the learner. A quantitative approach to interactive learning and adaptive behavior is proposed, integrating model- and decision-making into one theoretical framework. I follow the same simple principles as above by requiring that the observer's world model and action policy should result in maximal predictive power at minimal complexity. A fundamental consequence of the feedback is that the optimal action policy balances exploration and control. Time permitting, I will discuss some simple examples which can be solved analytically and I will talk about integrating reward maximization into this theory.