What Are Bayesian Neural Network Posteriors Really Like?

Pavel Izmailov (New York University)

Live Stream

Abstract

The posterior over Bayesian neural network (BNN) parameters is extremely high-dimensional and non-convex. For computational reasons, researchers approximate this posterior using inexpensive mini-batch methods such as mean-field variational inference or stochastic-gradient Markov chain Monte Carlo (SGMCMC). To investigate foundational questions in Bayesian deep learning, we instead use full-batch Hamiltonian Monte Carlo (HMC) on modern architectures. We show that (1) BNNs can achieve significant performance gains over standard training and deep ensembles; (2) a single long HMC chain can provide a comparable representation of the posterior to multiple shorter chains; (3) in contrast to recent studies, we find posterior tempering is not needed for near-optimal performance, with little evidence for a "cold posterior" effect, which we show is largely an artifact of data augmentation; (4) BMA performance is robust to the choice of prior scale, and relatively similar for diagonal Gaussian, mixture of Gaussian, and logistic priors; (5) Bayesian neural networks show surprisingly poor generalization under domain shift; we demonstrate, explain and provide remedies for this effect; (6) while cheaper alternatives such as deep ensembles and SGMCMC methods can provide good generalization, they provide distinct predictive distributions from HMC. Notably, deep ensemble predictive distributions are similarly close to HMC as standard SGLD, and closer than standard variational inference.

References

I will talk about two of our recent papers:

[1] What Are Bayesian Neural Network Posteriors Really Like? Pavel Izmailov, Sharad Vikram, Matthew D. Hoffman, Andrew Gordon Wilson

[2] Dangers of Bayesian Model Averaging under Covariate Shift: Pavel Izmailov, Patrick Nicholson, Sanae Lotfi, Andrew Gordon Wilson

Other useful references:

[3] Bayesian Deep Learning and a Probabilistic Perspective of Generalization: Andrew Gordon Wilson, Pavel Izmailov

[4] How Good is the Bayes Posterior in Deep Neural Networks Really? Florian Wenzel, Kevin Roth, Bastiaan S. Veeling, Jakub Świątkowski, Linh Tran, Stephan Mandt, Jasper Snoek, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin

[5] Bayesian Neural Network Priors Revisited: Vincent Fortuin, Adrià Garriga-Alonso, Florian Wenzel, Gunnar Rätsch, Richard Turner, Mark van der Wilk, Laurence Aitchison

Links

seminar

07.08.25 09.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

See Details

Upcoming Events of this Seminar

Thursday, 07.08.25 Efficient compression of neural networks and datasets with Lukas Barth
Thursday, 14.08.25 to be announced with Jonathan Siegel
Thursday, 21.08.25 to be announced with Zhou Fan
Thursday, 28.08.25 to be announced with Randall Balestriero
Thursday, 02.10.25 to be announced with Marcello Carioni
Thursday, 09.10.25 to be announced with Baharan Mirzasoleiman