Phase transitions on the universal tradeoff between prediction and compression in machine learning
- Tailin Wu (Stanford University)
Abstract
Many learning problems (e.g. VAE, MDL, Information Bottleneck, various regularization techniques) involve a tradeoff between prediction error and some measure of complexity of the model or the representation, minimizing a linear combination of the form (prediction error) + beta*complexity for some hyperparameter beta that parametrizes the tradeoff. How does the learning change as we vary beta, and what is its relationship with the structure of the dataset and model capacity? How can we design practical objectives and algorithms to obtain good prediction with low complexity?
To gain insight, we study phase transitions, identifying beta-values where key quantities such as prediction accuracy change in a discontinuous way. We then introduce a general technique for analytically and algorithmically predicting phase transitions where the global minimum of the loss landscape transitions to a saddle point. For the Information Bottleneck, we derive accurate phase transition predictions that illuminated the relation between the objective, dataset structure, learned representation and model capacity, for example identifying classes that are easy vs hard to learn. We also draw close connection between the phase transition in the Information Bottleneck and the second-order phase transition in physics. Finally, I will introduce similar tradeoff phenomena in other learning scenarios, and point to open problems.