Functional dimension of ReLU Networks
- Julia Elisenda Grigsby (Boston College)
The parameter space for any fixed architecture of neural networks serves as a proxy during training for the associated class of functions - but how faithful is this representation? For any fixed feedforward ReLU network architecture with at least one hidden layer, it is well-known that many different parameter settings can determine the same function. It is less well-known that the degree of this redundancy is inhomogeneous across parameter space. This inhomogeneity should impact the dynamics of training via gradient descent, especially when compared with recent work suggesting that SGD favors flat minima of the loss landscape.
In this talk, I will carefully define the notion of the local functional dimension of a feedforward ReLU network function, discuss the relationship between local functional dimension of a parameter and the geometry of the underlying decomposition of the domain into linear regions, and present some preliminary experimental results suggesting that the probability distribution of the functional dimension at initialization is both interesting and architecture-dependent. Some of this work is joint with Kathryn Lindsey, Rob Meyerhoff, and Chenxi Wu, and some is joint with Kathryn Lindsey and David Rolnick.