Dropout regularization viewed from the large deviations perspective

Oxana Manita (Eindhoven University of Technology, Netherlands)

Plenarsaal Center for Interdisciplinary Research (ZiF), Bielefeld University (Bielefeld)

Abstract

Dropout regularisation for training neural networks turns out to be very successful in practical applications. The empirical explanation of this success is based on reducing co-adaptation of features during training. Moreover, practicionners observe that 'training with dropout converges not faster, but to a better local minimum'. However, there is hardly any mathematical understanding of these statements. In this talk I want to give a mathematical interpretation of the last statement, discuss a continuous time model of training with dropout and explain why it 'converges to a better local minimum' than in case of a conventional training.

conference

04.08.21 07.08.21

Conference on Mathematics of Machine Learning Conference on Mathematics of Machine Learning

Center for Interdisciplinary Research (ZiF), Bielefeld University Plenarsaal

See Details

Benjamin Gess

Max Planck Institute for Mathematics in the Sciences and Universität Bielefeld

Guido Montúfar

Max Planck Institute for Mathematics in the Sciences and UCLA

Nihat Ay

Hamburg University of Technology