Structure of Learning Tasks and the Information in the Weights of a Deep Network

Alessandro Achille (University of California, Los Angeles)

Live Stream

Abstract

What are the fundamental quantities to understand the learning process of a deep neural network? Why are some datasets easier than others? What does it mean for two tasks to have a similar structure? We argue that information theoretic quantities, and in particular the amount of information that SGD stores in the weights, can be used to characterize the training process of a deep network. In fact, we show that the information in the weights bounds the generalization error and the invariance of the learned representation. It also allows us to connect the learning dynamics with the "structure function" of the dataset, and to define a notion of distance between tasks, which relates to fine-tuning. The non-trivial dynamics of information during training give rise to phenomena, such as critical periods for learning, that closely mimic those observed in humans and may suggest that forgetting information about the training data is a necessary part of the learning process.

Links

seminar

14.08.25 02.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

See Details

Upcoming Events of this Seminar

Thursday, 14.08.25 to be announced with Jonathan Siegel
Thursday, 02.10.25 to be announced with Marcello Carioni