Zusammenfassung für den Vortrag am 08.09.2022 (17:00 Uhr)Math Machine Learning seminar MPI MIS + UCLA
Devansh Arpit (Salesforce)
Optimization Aspects that Improve IID and OOD Generalization in Deep Learning
Siehe auch das Video dieses Vortrages.
Siehe auch die Vortragsfolien dieses Vortrages.
This talk will discuss two aspects of optimization geared towards improving generalization in Deep Learning, involving: i) regularization effect of SGD, and ii) model averaging.
The first part of the talk will describe a mechanism that captures the implicit *Regularization Effect of SGD*. We show that it depends on the Fisher Information Matrix and replicate SGD’s implicit regularization effect with an explicit form.
We will then delve into OOD generalization where we demonstrate that the difference in performance of deep models on in-domain validation sets and distribution-shifted test sets varies in a chaotic manner during training using ERM. This leads to poor model selection when using in-domain validation data for early stopping, and makes the trained model unreliable. We discuss this problem and propose *Model Averaging* as an approach of mitigation, leading to state-of-the-art performance on the DomainBed benchmark. I will conclude with a preview of my future research directions in the broad area of OOD generalization.
I am currently a senior research scientist at Salesforce AI Research. Prior to this, I was a postdoc at Mila with Prof. Yoshua Bengio. I received my PhD and M.S. degrees in Computer Science and Engineering from University at Buffalo, and B.S. degree in Electrical Engineering from IIT-BHU, India.
I am interested in representation learning analysis and algorithms for supervised/self-supervised learning, generative modeling, and more recently out of distribution generalization. My interest also lies at the intersection of optimization and generalization in deep learning, specifically, aspects of deep learning optimization that improve generalization. I have recently also worked on time series analytics in which I designed deep learning algorithms for forecasting and anomaly detection. I am currently developing a toolbox for causal discovery for tabular and time series data, as well as conducting research in this area.