Occam’s razors in neural networks: frequency principle in training and embedding principle of loss landscape

Zhi-Qin John Xu (Shanghai Jiao Tong University)

Live Stream

Abstract

I would demonstrate that a neural network (NN) learns training data as simple as it can, resembling an implicit Occam's Razor, from the following two viewpoints. First, the NN output often follows a frequency principle, i.e., learning data from low to high frequency. Second, we prove an embedding principle that the loss landscape of a NN "contains" all the critical points of all the narrower NNs. The embedding principle provides a basis for the condensation phenomenon, i.e., the NN weights condense on isolated directions when initialized small, which means the effective NN size is much smaller than its actual size, i.e., a simple representation of the training data.

Bio:

Zhi-Qin John Xu is an associate professor at Shanghai Jiao Tong University (SJTU). Zhi-Qin obtain B.S. in Physics (2012) and a Ph.D. degree in Mathematics (2016) from SJTU. Before joining SJTU, Zhi-Qin worked as a postdoc at NYUAD and Courant Institute from 2016 to 2019.

Links

seminar

03.07.25 02.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

See Details

Upcoming Events of this Seminar

Thursday, 03.07.25 On the Power of Context-Enhanced Learning in LLMs with Xingyu Zhu
Thursday, 10.07.25 The effect of low rank and stochasticity on Gradient Descent at the Edge of Stability with Avrajit Ghosh a.o.
Thursday, 14.08.25 to be announced with Jonathan Siegel
Thursday, 02.10.25 to be announced with Marcello Carioni