Workshop
Dropout regularization viewed from the large deviations perspective
- Oxana Manita (Eindhoven University of Technology, Netherlands)
Abstract
Dropout regularisation for training neural networks turns out to be very successful in practical applications. The empirical explanation of this success is based on reducing co-adaptation of features during training. Moreover, practicionners observe that 'training with dropout converges not faster, but to a better local minimum'. However, there is hardly any mathematical understanding of these statements. In this talk I want to give a mathematical interpretation of the last statement, discuss a continuous time model of training with dropout and explain why it 'converges to a better local minimum' than in case of a conventional training.