Lower Bounds on Complexity of Shallow Networks
- Věra Kůrková (Institute of Computer Science, Czech Academy of Sciences, Czech Republic)
Abstract
Although originally biologically inspired neural networks were introduced as multilayer computational models, shallow networks have been dominant in applications till the recent renewal of interest in deep architectures. Experimental evidence and successful applications of deep networks pose theoretical questions asking: When and why are deep networks better than shallow ones?
This lecture will present some probabilistic and constructive results showing limitations of shallow networks. It will show how geometrical properties of high-dimensional spaces imply probabilistic lower bounds on network complexity. Bounds depending on covering numbers of dictionaries of computational units will be derived and combined with estimates of sizes of some common dictionaries used in neurocomputing. Probabilistic results will be complemented by constructive ones built using Hadamard matrices and pseudo-noise sequences. The results will be illustrated by an example of a class of functions which can be computed by two-hidden-layer perceptron networks of considerably smaller model complexities than by networks with only one hidden layer. Connections with the No Free Lunch Theorem and the central paradox of coding theory will be discussed.