Efficient compression of neural networks and datasets
- Lukas Barth (MPI MiS, Leipzig)
Abstract
This talk explores the theoretical and practical connections between model and data compression, generalization, Solomonoff induction and the minimum description length (MDL) principle, which is one of the fundamental learning principles in machine learning.
Understanding the L_0-regularized learning objective as a tractable approximation of MDL learning, we introduce and compare three improved L_0-regularization methods for neural networks. These methods are evaluated across diverse architectures and datasets, including language modeling, image classification and synthetic teacher-student setups.
Our research demonstrates that L_0-regularization can achieve better compression-performance trade-offs than unregularized models, often resulting in more sample efficient convergence and smaller networks with improved accuracy. We also delve into the theoretical foundations of these approaches, connecting empirical results to algorithmic complexity theory.
The talk is based on the following work by Lukas Barth and Paulo von Petersenn: