The Lottery Ticket Hypothesis: On Sparse, Trainable Neural Networks
- Jonathan Frankle (Massachusetts Institute of Technology)
Abstract
We recently proposed the "Lottery Ticket Hypothesis", which conjectures that the dense neural networks we typically train have much smaller subnetworks capable of training in isolation to the same accuracy starting from the original initialization. This hypothesis raises questions about the nature of overparameterization and the importance of initialization for training neural networks in practice. In this talk, I will discuss existing work and the latest developments on the "Lottery Ticket Hypothesis," including the empirical evidence for these claims on small vision tasks, changes necessary to scale these ideas to ImageNet, and the relationship between these subnetworks and their "stability" to the noise of stochastic gradient descent. This research is entirely empirical, although it has exciting implications for theory. (This is joint work with Gintare Karolina Dziugaite, Daniel M. Roy, Alex Renda, and Michael Carbin.)
Bio:
Jonathan Frankle is a fourth year PhD student at MIT, where he studies empirical deep learning. His current research focus is on the properties of sparse neural networks that allow them to train effectively as embodied by his proposed "Lottery Ticket Hypothesis" (ICLR 2019 best paper award). Jonathan also has an interest in technology policy: he works closely with lawyers, journalists, and policymakers on topics in AI policy, and he teaches at the Georgetown University Law Center. He earned his BSE and MSE in computer science at Princeton University and has previously spent time at Facebook AI Research, Google Brain, and Microsoft.