Regularization by Misclassification in ReLU Neural Networks
- Ido Nachum (École Polytechnique Fédérale de Lausanne)
Abstract
We highlight a property of feedforward neural networks trained with SGD for classification tasks. An SGD update for a misclassified data point decreases the Frobenius norm of the weight matrix of each layer. This holds for networks of any architecture and activations such as ReLU and Leaky ReLU.This may explain why in practice the performance of neural networks can be similar with or without