The Information Geometry of Unsupervised Reinforcement Learning
- Benjamin Eysenbach (Carnegie Mellon University & Google Brain)
Reinforcement learning (RL) is notoriously sample inefficient, prompting a large body of prior work to study how unsupervised pretraining can improve sample efficiency when solving downstream RL tasks. One approach is unsupervised skill discovery, a class of algorithms that learn a set of policies without access to a reward function. While prior work has shown that these methods learn skills that can accelerate downstream RL tasks, it remains unclear whether these methods always learn useful skills. What does it even mean for a skill (or set of skills) to be probably useful?
In this talk, I'll share some recent work that provides some of the first answers to this question: existing methods are optimal one one sense but very suboptimal in a different sense. These results have implications for how users might want to use skill learning algorithms, provide some (surprisingly simple!) tools for analysing skill learning methods, and suggest exciting opportunities for designing new, provably-useful skill learning algorithms.