Geometry of optimal design (16.06.2020)

Frank Röttger

The theory of Optimal Design is one of the standard tools of statistics methodology. Its main objective is to plan experiments in a way to maximize the information gained. In linear regression models, design theory translates this problem into a global optimization problem, i.e. the solution does not depend on the true value of the parameter. In non-linear regression, optimality is only local with respect to the parameter of interest, resulting in a parameterized optimization problem. We illustrate with an example.

In 1929, Ernst Zermelo [3] introduced a paired comparison model, later known as the Bradley–Terry paired comparison model [1], to estimate the ranking of chess players in tournaments. The parameters here are the relative playing strengths of each competitor in the tournament. The design question is: Which tournament setup provides the most information about the parameters of interest, i.e. which setup implies the smallest variance of the maximum likelihood estimates for the playing strengths.

In [2], we computed solutions for such a tournament with four players by solving the parameterized optimization problem with non-linear computer algebra. The image to the right illustrates the solutions in the parameter space. As explained, the optimal tournament design depends on the true value of the parameters. If all four players have the same playing strength, it is most informative if the games are distributed equally, which corresponds to the origin in the image. In the orange region around the origin, the optimal design assigns a positive number of games to each pair of players. If the true parameter is either in the green, blue or red regions, the optimal design assigns a positive number of games to either 5, 4 or 3 pairs. Having 3 of the pairs is minimal, otherwise the playing strengths are not comparable (corresponding to a non-connected graph with players as nodes).

Imagine a tournament with a chess grandmaster (A), a professional player (B), a regular player (C) and a beginner (D). In this case, we have some knowledge on the playing strength, for instance Elo scores. Naturally, a game between a chess grandmaster and a beginner is not very informative on the true value of the playing strength. Therefore the optimal setup, if the true playing strength are pairwise reasonably different, is to distribute the games equally among the pairs of players with adjacent strengths. In our example this corresponds to a setup where only A–B, B–C and C–D are played in the tournament. In fact, we show that for any number of players, in case of pairwise strongly different playing strengths, the optimal tournament plan with minimal number of pairs is always of this form. Furthermore, we provide a semi-algebraic description (described by polynomial inequalities and equations) of the parameter regions of optimality.

For any non-linear regression model with polynomial regression vector and a discrete set of possible experimental settings, the optimality of a design is characterized by a semi-algebraic set. The main problem is the computational complexity of the computer algebra tools. Dealing with this complexity will require further collaboration with experts from non-linear algebra.


[1]   R. A. Bradley and M. E. Terry, Rank analysis of incomplete block designs. I. The method of paired comparisons, Biometrika 39 (1952), 324–345. MR 0070925
[2]   T. Kahle, F. Röttger, and R. Schwabe, The semi-algebraic geometry of optimal designs for the Bradley–Terry model, arXiv e-prints (2019), arXiv:1901.02375.
[3]   E. Zermelo, Die Berechnung der Turnier-Ergebnisse als ein Maximumproblem der Wahrscheinlichkeitsrechnung, Mathematische Zeitschrift 29 (1929), 436–460.

16.06.2020, 21:43