The Description Length of Deep Learning Models

Léonard Blier (Facebook AI Research, Université Paris Saclay, Inria)

Live Stream

Abstract

Solomonoff's general theory of inference and the Minimum Description Length principle formalize Occam's razor, and hold that a good model of data is a model that is good at losslessly compressing the data, including the cost of describing the model itself. This theory gives a very interesting viewpoint on many known results in statistics and machine learning. But the success of Deep Learning seems to go against this theory: while deep neural networks are often the best models in practice, they are also extremely complex, in the sense that they are hard to compress. We solve this paradox and demonstrate experimentally the ability of deep neural networks to compress the training data even when accounting for parameter encoding.

seminar

07.08.25 09.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

See Details

Upcoming Events of this Seminar

Thursday, 07.08.25 Efficient compression of neural networks and datasets with Lukas Barth
Thursday, 14.08.25 to be announced with Jonathan Siegel
Thursday, 21.08.25 to be announced with Zhou Fan
Thursday, 28.08.25 to be announced with Randall Balestriero
Thursday, 02.10.25 to be announced with Marcello Carioni
Thursday, 09.10.25 to be announced with Baharan Mirzasoleiman