Maximizing the Kullback-Leibler distance
- Johannes Rauh (Max Planck Institute for Mathematics in the Sciences, Germany)
Abstract
Nihat Ay proposed the following problem [1], motivated from statistical learning theory: Let $\mathcal{E}$ be an exponential family. Find the maximizer of the Kullback-Leibler distance $D(P\|\mathcal{E})$ from $\mathcal{E}$. A maximizing probability measure $P$ has a lot of interesting properties. For example, the restriction of $\hat{P}$ to the support of $P$ will be equal to $P$, i.e. $\hat{P}(x) = P(x)\hat{P}(Z)$ if $x\in\ensuremath{\mathrm{supp}}(P)$ (for the proof in the most general case see [2]). This simple property can be used to transform the problem into another form. The first observation is that probability measures having this "projection property" always come in pairs $P_{1},P_{2}$, such that $P_{1}$ and $P_{2}$ have the same sufficient statistics $A$ and disjoint supports. Therefore we can solve the original problem by investigating the kernel of the sufficient statistics $\ker A$. If we find all local maximizers of \begin{equation*}\overline D(M) := \sum_{x} M(x) \log |M(x)|, \quad M\in\ker A,\end{equation*} subject to $\|M\|_{\ell_{1}} \le 2$, then we know all maximizers of the original problem. The talk will present the transformed problem and its relation to the original problem. In the end I will give some consequences for the solutions of the original problem.
[1] N. Ay: An Information-Geometric Approach to a Theory of Pragmatic Structuring. The Annals of Probability 30 (2002) 416-436.
[2] F. Matúš: Optimality conditions for maximizers of the information divergence from an exponential family. Kybernetika 43, 731-746.