Zusammenfassung für den Vortrag am 16.12.2021 (17:00 Uhr)Math Machine Learning seminar MPI MIS + UCLA
Zhi-Qin John Xu (Shanghai Jiao Tong University)
Occam’s razors in neural networks: frequency principle in training and embedding principle of loss landscape
I would demonstrate that a neural network (NN) learns training data as simple as it can, resembling an implicit Occam’s Razor, from the following two viewpoints. First, the NN output often follows a frequency principle, i.e., learning data from low to high frequency. Second, we prove an embedding principle that the loss landscape of a NN "contains" all the critical points of all the narrower NNs. The embedding principle provides a basis for the condensation phenomenon, i.e., the NN weights condense on isolated directions when initialized small, which means the effective NN size is much smaller than its actual size, i.e., a simple representation of the training data.
Zhi-Qin John Xu is an associate professor at Shanghai Jiao Tong University (SJTU). Zhi-Qin obtain B.S. in Physics (2012) and a Ph.D. degree in Mathematics (2016) from SJTU. Before joining SJTU, Zhi-Qin worked as a postdoc at NYUAD and Courant Institute from 2016 to 2019.