Search
Talk

On the Generality of Relative Entropy Policy Iteration.

  • Nikola Milosevic (Max Planck Institute for Cognitive and Brain Sciences)
ScaDS.AI D05.17 Universität Leipzig (Leipzig)

Abstract

Relative Entropy Policy Iteration is a reinforcement learning framework that alternates between policy evaluation and relative-entropy–regularized improvement. A prominent example is Maximum a Posteriori Policy Optimization (MPO), widely used in robotics and control. In this talk, I revisit the underlying principle of MPO from a theoretical perspective and suggest that its core idea may extend to much more general settings, including nonlinear utilities and continuous state-action spaces. The theory is still incomplete, but I will outline a possible mathematical framework and hope to gather feedback and ideas from the audience on how to formalize and analyze this perspective.

seminar
21.11.25 05.12.25

MiS/ScaDS/CBS Math and AI Meeting MiS/ScaDS/CBS Math and AI Meeting

Universität Leipzig ScaDS.AI D05.17

Upcoming Events of this Seminar