Best k-layer neural network approximations
Lek-Heng Lim, Mateusz Michałek, and Yang Qi
Contact the author: Please use for correspondence this email.
Submission date: 16. Jul. 2019
MSC-Numbers: 92B20, 41A50, 41A30
Link to arXiv: See the arXiv entry of this preprint.
We investigate the geometry of the empirical risk minimization problem for k-layer neural networks. We will provide examples showing that for the classical activation functions σ(x)=1/(1+exp(−x)) and σ(x)=tanh(x), there exists a positive-measured subset of target functions that do not have best approximations by a fixed number of layers of neural networks. In addition, we study in detail the properties of shallow networks, classifying cases when a best k-layer neural network approximation always exists or does not exist for the ReLU activation σ=max(0,x). We also determine the dimensions of shallow ReLU-activated networks.