Optimization Aspects that Improve IID and OOD Generalization in Deep Learning

Devansh Arpit (Salesforce)

Live Stream

Abstract

This talk will discuss two aspects of optimization geared towards improving generalization in Deep Learning, involving: i) regularization effect of SGD, and ii) model averaging.

The first part of the talk will describe a mechanism that captures the implicit *Regularization Effect of SGD*. We show that it depends on the Fisher Information Matrix and replicate SGD’s implicit regularization effect with an explicit form.

We will then delve into OOD generalization where we demonstrate that the difference in performance of deep models on in-domain validation sets and distribution-shifted test sets varies in a chaotic manner during training using ERM. This leads to poor model selection when using in-domain validation data for early stopping, and makes the trained model unreliable. We discuss this problem and propose *Model Averaging* as an approach of mitigation, leading to state-of-the-art performance on the DomainBed benchmark. I will conclude with a preview of my future research directions in the broad area of OOD generalization.

Bio:
I am currently a senior research scientist at Salesforce AI Research. Prior to this, I was a postdoc at Mila with Prof. Yoshua Bengio. I received my PhD and M.S. degrees in Computer Science and Engineering from University at Buffalo, and B.S. degree in Electrical Engineering from IIT-BHU, India.
I am interested in representation learning analysis and algorithms for supervised/self-supervised learning, generative modeling, and more recently out of distribution generalization. My interest also lies at the intersection of optimization and generalization in deep learning, specifically, aspects of deep learning optimization that improve generalization. I have recently also worked on time series analytics in which I designed deep learning algorithms for forecasting and anomaly detection. I am currently developing a toolbox for causal discovery for tabular and time series data, as well as conducting research in this area.

Links

seminar

07.08.25 09.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

See Details

Upcoming Events of this Seminar

Thursday, 07.08.25 Efficient compression of neural networks and datasets with Lukas Barth
Thursday, 14.08.25 to be announced with Jonathan Siegel
Thursday, 21.08.25 to be announced with Zhou Fan
Thursday, 28.08.25 to be announced with Randall Balestriero
Thursday, 02.10.25 to be announced with Marcello Carioni
Thursday, 09.10.25 to be announced with Baharan Mirzasoleiman