Regularization by Misclassification in ReLU Neural Networks

Ido Nachum (École Polytechnique Fédérale de Lausanne)

Live Stream

Abstract

We highlight a property of feedforward neural networks trained with SGD for classification tasks. An SGD update for a misclassified data point decreases the Frobenius norm of the weight matrix of each layer. This holds for networks of any architecture and activations such as ReLU and Leaky ReLU.This may explain why in practice the performance of neural networks can be similar with or without $L_{2}$ regularization. We then use this insight to study cases for which a ReLU network is presented with random or noisy labels. On the one hand, using completely random labels eliminate neurons and can nullify the network completely. On the other hand, with some label noise, we observe cases with considerable improvement in performance. That is, some label noise can decrease the test error of a network in an artificial example. Also, in an experiment with the MNIST dataset, we show that training with noisy labels performs comparably to standard training and yields a sparse over-complete basis for MNIST.

Links

seminar

14.08.25 02.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

See Details

Upcoming Events of this Seminar

Thursday, 14.08.25 to be announced with Jonathan Siegel
Thursday, 02.10.25 to be announced with Marcello Carioni