Publications
2021
Issue 22

MiS Preprint Repository

We have decided to discontinue the publication of preprints on our preprint server end of 2024. The publication culture within mathematics has changed so much due to the rise of repositories such as ArXiV (www.arxiv.org) that we are encouraging all institute members to make their preprints available there. An institute's repository in its previous form is, therefore, unnecessary. The preprints published to date will remain available here, but we will not add any new preprints here.

MiS Preprint

22/2021

The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs

Johannes Müller and Guido Montúfar

ArXiv: 2110.07409

Abstract

We consider the problem of finding the best memoryless stochastic policy for an infinite-horizon partially observable Markov decision process (POMDP) with finite state and action spaces with respect to either the discounted or mean reward criterion. We show that the (discounted) state-action frequencies and the expected cumulative reward are rational functions of the policy, whereby the degree is determined by the degree of partial observability. We then describe the optimization problem as a linear optimization problem in the space of feasible state-action frequencies subject to polynomial constraints that we characterize explicitly. This allows us to address the combinatorial and geometric complexity of the optimization problem using recent tools from polynomial optimization. In particular, we demonstrate how the partial observability constraints can lead to multiple smooth and non-smooth local optimizers and we estimate the number of critical points.

Contact the author per mail

Received:: 14.10.21

Published:: 14.10.21

MSC Codes:: 90C40, 93E20, 49M37, 90C23

Keywords:: POMDPs, Memoryless Policies, Critical points, State-action frequencies, Algebraic degree

Related publications

inBook

2022 Repository Open Access

Johannes Müller and Guido Montúfar

The geometry of memoryless stochastic policy optimization in infinite-horizon POMDPs

In: ICLR 2022 : Tenth international conference on learning representations ; 25th April 2022
[s. l.] : ICLR, 2022. - pp. 1-45

BibTex ArXiv: 2110.07409 Link: openreview.net