Understanding Training and Adaptation in Feature Learning: From Two-Layer Networks to Foundation Models

Zhenmei Shi (MongoDB + Voyage AI)

Live Stream

Abstract

Deep neural networks excel at learning feature representations, setting them apart from traditional machine learning. This talk explores how feature learning emerges during neural network training and its role in foundation models’ adaptation. We provide theoretical insights into how networks efficiently learn class-relevant patterns early in training, leveraging data structures beyond kernel methods. Extending our analysis to Transformers, we examine Fourier features and the relationship between model scale and in-context learning. Building on these insights, we propose practical improvements, including nuclear norm regularization for domain generalization, a novel contrastive learning regularization, looped Transformers for multi-step gradient descent, and GemFilter for accelerating LLM inference. Our findings enhance understanding and efficiency in modern machine learning systems.

Links

seminar

05.06.25 02.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

See Details

Upcoming Events of this Seminar

Thursday, 05.06.25 Complexity of Deciding Injectivity and Surjectivity of ReLU Neural Networks with Moritz Grillo
Thursday, 12.06.25 to be announced with Pierfrancesco Beneventano
Thursday, 19.06.25 to be announced with Jingfeng Wu
Thursday, 03.07.25 to be announced with Xingyu Zhu
Thursday, 10.07.25 to be announced with Avrajit Ghosh a.o.
Thursday, 14.08.25 to be announced with Jonathan Siegel
Thursday, 02.10.25 to be announced with Marcello Carioni