All you need is relative information

Shaowei Lin (working in a stealth startup)

Live Stream

Abstract

Relative information (relative entropy, KL divergence) and variational inference are powerful tools for deriving learning algorithms and their asymptotic properties, for both static systems and dynamic systems. The goal of this talk is to motivate a general online stochastic learning algorithm for stochastic processes with latent variables or memory, that provably converges under some regularity conditions. Please visit bit.ly/3kmovql for details.

In the first half of the talk, we study static systems, viewing maximum likelihood and Bayesian inference through the lens of relative information. In particular, their generalization errors may be derived by resolving the singularities of relative information. We then frame the two learning algorithms as special cases of variational inference with different computational constraints.

In the second half of the talk, we study dynamic systems, extending this variational inference method and computational perspective to stochastic processes and online learning. In particular, the training objective function will be a form of relative information which can be optimized iteratively in a way similar to expectation-maximization. The relative information objective provides a precise way to discuss the trade-off between exploration and exploitation during training.

Links

seminar

03.07.25 02.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

See Details

Upcoming Events of this Seminar

Thursday, 03.07.25 On the Power of Context-Enhanced Learning in LLMs with Xingyu Zhu
Thursday, 10.07.25 The effect of low rank and stochasticity on Gradient Descent at the Edge of Stability with Avrajit Ghosh a.o.
Thursday, 14.08.25 to be announced with Jonathan Siegel
Thursday, 02.10.25 to be announced with Marcello Carioni