Search

Talk

Occam’s razors in neural networks: frequency principle in training and embedding principle of loss landscape

  • Zhi-Qin John Xu (Shanghai Jiao Tong University)
Live Stream

Abstract

I would demonstrate that a neural network (NN) learns training data as simple as it can, resembling an implicit Occam's Razor, from the following two viewpoints. First, the NN output often follows a frequency principle, i.e., learning data from low to high frequency. Second, we prove an embedding principle that the loss landscape of a NN "contains" all the critical points of all the narrower NNs. The embedding principle provides a basis for the condensation phenomenon, i.e., the NN weights condense on isolated directions when initialized small, which means the effective NN size is much smaller than its actual size, i.e., a simple representation of the training data.

Bio:

Zhi-Qin John Xu is an associate professor at Shanghai Jiao Tong University (SJTU). Zhi-Qin obtain B.S. in Physics (2012) and a Ph.D. degree in Mathematics (2016) from SJTU. Before joining SJTU, Zhi-Qin worked as a postdoc at NYUAD and Courant Institute from 2016 to 2019.

Links

seminar
5/2/24 5/16/24

Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

Katharina Matschke

MPI for Mathematics in the Sciences Contact via Mail

Upcoming Events of This Seminar