London School of Economics
Lower Bounds on Neural Network Depth via (Lattice) Polytopes
This talk will be centered around the following open problem about neural networks with rectified linear units (ReLUs): What is the minimum number of hidden layers required to represent every continuous piecewise linear function with input dimension n? While it is known that log(n) layers are always sufficient, no lower bound larger than 2 has been proven so far. We conjecture that the logarithmic upper bound is indeed the true answer. We provide evidence for our conjecture by proving special cases using methods from mixed-integer programming and polyhedral theory. In particular, we demonstrate how previously discovered connections between neural networks and tropical geometry can be used to translate the problem into the language of Newton polytopes. Using theory of lattice polytopes, we then prove the conjecture under some integrality assumption.
This talk is based on joint works with Amitabh Basu, Marco Di Summa, and Martin Skutella, as well as with Christian Haase and Georg Loho.
KTH Royal Institute of Technology
The Geometry of Linear Convolutional Networks
We discuss linear convolutional neural networks (LCNs) and their critical points. We observe that the function space (i.e., the set of functions represented by LCNs) can be identified with polynomials that admit certain factorizations, and we use this perspective to describe the impact of the network’s architecture on the geometry of the function space. For instance, for LCNs with one-dimensional convolutions having stride one and arbitrary filter sizes, we provide a full description of the boundary of the function space. We further study the optimization of an objective function over such LCNs: We characterize the relations between critical points in function space and in parameter space and show that there do exist spurious critical points. We compute an upper bound on the number of critical points in function space using Euclidean distance degrees and describe dynamical invariants for gradient descent. This talk is based on joint work with Thomas Merkh, Guido Montúfar, and Matthew Trager.
Max Planck Institute for Mathematics in the Sciences
From Markov decision processes to algebraic statistics
We study the reward maximization problem in partially observable Markov decision processes with stationary stochastic policies, which constitute an important model in sequential decision making under uncertainty. We obtain a description of the problem as the optimization of a linear objective over the probability simplex subject to polynomial constraints that we characterize explicitly. This allows us to describe the combinatorial and algebraic complexity of the problem and we obtain bounds on its algebraic degree. Further, we conduct experiments using methods from numerical algebra to solve the critical equations and compare the results to standard gradient based optimization.
University of Durham
Tropical geometry and linear regions of maxout networks
The talk is split into two parts. In the first part, we discuss
bounding the number of linear regions via tropical maxout
arrangements. In the second part, we give an overview on ongoing experiments on the initializing maxout networks. The first part is joint work with Guido Montufar and Leon Zhang, the second part is joint work with Iolo Jones and Ziva Urbancic.
University of California, LA
Universality of Generalizations of Restricted Boltzmann Machines
Restricted Boltzmann machines (RBM) are graphical models that can be used to model probability distributions on the corners of the n dimensional unit cube. It can be shown that the distribution modeled by an RBM with m hidden nodes can parameterized as the Hadamard product of m tensors (called factors) that are super modular and have flattening rank at most 2. Inspired by this we look at two different models obtained by either removing the super modularity or the flattening rank condition. For both models, we try to answer the following question. Given an arbitrary tensor p what is the maximum number of tensors m so that p can be represented as the Hadamard product of m factors? We show that if p has strictly positive entries then it can be represented using n super modular factors. On the hand, for the model where our factors have flattening rank at most 2, we present lower and upper bounded for m. Using the proof idea, if we look at representing M by N matrices as Hadamard products of m rank r matrices, we present a new proof for the lower bound from Friedenberg, Onetto, and Williams 2017.
University of Texas, Austin
Representing piecewise linear functions minimally
Piecewise linear functions are ubiquitous in tropical geometry, convex geometry, and machine learning. Every Piecewise Linear function can be represented as a tropical rational function (difference of max-affine functions), whereas not much has been done to define minimal representations. We propose two minimality concepts, induced by the so-called monomial length and factorization length of tropical polynomials. We will demonstrate that in practice, factorization length is better behaved in terms of solving the minimal representation of a piecewise linear function, due to the nice properties it satisfies, while the monomial length is related to extremal problems of equivalence classes of polytope pairs. The difference of those two concepts is controlled by arrangements of tropical hypersurfaces. We obtain a different proof of the lower bound by Tseran and Montúfar for the number of regions cut out by general tropical hypersurfaces and, as a byproduct, a lower bound for the number of vertices of Minkowski sums of polytopes.
August 08, 2022