Abstract for the talk on 23.07.2020 (17:00 h)Math Machine Learning seminar MPI MIS + UCLA
Léonard Blier (Facebook AI Research, Université Paris Saclay, Inria)
The Description Length of Deep Learning Models
Solomonoff's general theory of inference and the Minimum Description Length principle formalize Occam's razor, and hold that a good model of data is a model that is good at losslessly compressing the data, including the cost of describing the model itself. This theory gives a very interesting viewpoint on many known results in statistics and machine learning. But the success of Deep Learning seems to go against this theory: while deep neural networks are often the best models in practice, they are also extremely complex, in the sense that they are hard to compress. We solve this paradox and demonstrate experimentally the ability of deep neural networks to compress the training data even when accounting for parameter encoding.