Talk
Phase transitions in learning machines
- Daniel Murfet (University of Melbourne)
Abstract
I will introduce the idea of phases and phase transitions of the Bayesian posterior in the setting of singular learning theory, and discuss how a simple auto-encoder model introduced by Anthropic in their research on neural network interpretability displays a rich set of phase transitions in both the posterior and over the course of training. I’ll explain a research program we term “developmental” interpretability that is aiming to use phase transitions as the basic primitive for understanding the internal structure of computation in neural networks.