Talk
On the Generality of Relative Entropy Policy Iteration.
- Nikola Milosevic (Max Planck Institute for Cognitive and Brain Sciences)
Abstract
Relative Entropy Policy Iteration is a reinforcement learning framework that alternates between policy evaluation and relative-entropy–regularized improvement. A prominent example is Maximum a Posteriori Policy Optimization (MPO), widely used in robotics and control. In this talk, I revisit the underlying principle of MPO from a theoretical perspective and suggest that its core idea may extend to much more general settings, including nonlinear utilities and continuous state-action spaces. The theory is still incomplete, but I will outline a possible mathematical framework and hope to gather feedback and ideas from the audience on how to formalize and analyze this perspective.