Search
Talk

Local Learning Coefficients in Transformers: A Singular‑Learning‑Theory Lens on Generalization in Neural Networks

  • Jiayi Li (MPI-CBG Dresden)
G3 10 (Lecture hall)

Abstract

Why do heavily over‑parameterized neural networks often generalize far better than classical statistical theory predicts, even after they have enough capacity to memorize the training data? This question sits at the heart of modern machine learning theory. Singular Learning Theory (SLT) tackles the puzzle through an algebraic‑geometric measure: the local learning coefficient—equivalently, the real log‑canonical threshold (RLCT). The RLCT quantifies the parameter‑space singularities where identifiability fails and the Fisher information degenerates. It dictates the leading  n−1n^{-1}n−1 term in the asymptotic expansion of Bayesian free energy and expected generalization error, thereby extending classical information criteria to today’s degenerate, high‑capacity models, such as neural networks.

In this talk, I will present a recent development of SLT for transformer architectures. In particular, we show that local learning coefficients predicts the generalization behavior of neural networks, and can be applied to defect and control the renowned 'grokking' phenomenon.

Upcoming Events of this Seminar