Abstract for the talk on 14.08.2020 (17:00 h)Math Machine Learning seminar MPI MIS + UCLA
Ido Nachum (École Polytechnique Fédérale de Lausanne)
Regularization by Misclassification in ReLU Neural Networks
See the video of this talk.
We highlight a property of feedforward neural networks trained with SGD for classification tasks. An SGD update for a misclassified data point decreases the Frobenius norm of the weight matrix of each layer. This holds for networks of any architecture and activations such as ReLU and Leaky ReLU.This may explain why in practice the performance of neural networks can be similar with or without \(L_2\) regularization. We then use this insight to study cases for which a ReLU network is presented with random or noisy labels. On the one hand, using completely random labels eliminate neurons and can nullify the network completely. On the other hand, with some label noise, we observe cases with considerable improvement in performance. That is, some label noise can decrease the test error of a network in an artificial example. Also, in an experiment with the MNIST dataset, we show that training with noisy labels performs comparably to standard training and yields a sparse over-complete basis for MNIST.