Information geometry in reinforcement learning: from natural policy gradients to relative entropy policy search

Jan Peters (Max Planck Institute for Biological Cybernetics, Germany)

University n.n. Universität Leipzig (Leipzig)

Abstract

Policy search is a successful approach to reinforcement learning which has yielded many interesting applications in a variety of areas.

Unlike traditional value function-based learning methods, policy search approaches can be guaranteed to converge at least to a local optimum, can handle partially observed variables and allow straightforward integration of domain knowledge. Research on policy search started with the classical work on parametric policy gradient methods. These turned out to be surprisingly slow and marred by bad trade-offs between exploration and exploitation parameters in the policy.

Results from information theory for supervised and unsupervised learning have triggered research into natural policy gradient methods. These turned out to be significantly more robust and efficient than the previous vanilla policy gradient approaches. An interesting interpretation of these results was that the policy improvements can fix the amount of loss of information while maximizing the reward. This loss of information is measured with the relative entropy between the experienced state-action distribution and the new one generated by the improved policy. We continue this path of reasoning and suggest the Relative Entropy Policy Search (REPS) method. The resulting method differs significantly from previous policy gradient approaches and yields an exact update step.

We show applications of these methods in the improvement of behaviors for robots and discuss its wider implications.

conference

02.08.10 06.08.10

Information Geometry and its Applications III Information Geometry and its Applications III

Universität Leipzig University n.n.

Details anzeigen

Antje Vandenberg

Max-Planck-Institut für Mathematik in den Naturwissenschaften Contact via Mail

Nihat Ay

Max Planck Institute for Mathematics in the Sciences, Germany

Paolo Gibilisco

Università degli Studi di Roma "Tor Vergata", Italy

František Matúš

Academy of Sciences of the Czech Republic, Czech Republic