The Lottery Ticket Hypothesis: On Sparse, Trainable Neural Networks

Jonathan Frankle (Massachusetts Institute of Technology)

Live Stream

Abstract

We recently proposed the "Lottery Ticket Hypothesis", which conjectures that the dense neural networks we typically train have much smaller subnetworks capable of training in isolation to the same accuracy starting from the original initialization. This hypothesis raises questions about the nature of overparameterization and the importance of initialization for training neural networks in practice. In this talk, I will discuss existing work and the latest developments on the "Lottery Ticket Hypothesis," including the empirical evidence for these claims on small vision tasks, changes necessary to scale these ideas to ImageNet, and the relationship between these subnetworks and their "stability" to the noise of stochastic gradient descent. This research is entirely empirical, although it has exciting implications for theory. (This is joint work with Gintare Karolina Dziugaite, Daniel M. Roy, Alex Renda, and Michael Carbin.)

Bio:
Jonathan Frankle is a fourth year PhD student at MIT, where he studies empirical deep learning. His current research focus is on the properties of sparse neural networks that allow them to train effectively as embodied by his proposed "Lottery Ticket Hypothesis" (ICLR 2019 best paper award). Jonathan also has an interest in technology policy: he works closely with lawyers, journalists, and policymakers on topics in AI policy, and he teaches at the Georgetown University Law Center. He earned his BSE and MSE in computer science at Princeton University and has previously spent time at Facebook AI Research, Google Brain, and Microsoft.

seminar

03.07.25 02.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

Details anzeigen

Upcoming Events of this Seminar

Donnerstag, 03.07.25 On the Power of Context-Enhanced Learning in LLMs with Xingyu Zhu
Donnerstag, 10.07.25 The effect of low rank and stochasticity on Gradient Descent at the Edge of Stability with Avrajit Ghosh a.o.
Donnerstag, 14.08.25 to be announced with Jonathan Siegel
Donnerstag, 02.10.25 to be announced with Marcello Carioni