Talk

Understanding Training and Adaptation in Feature Learning: From Two-Layer Networks to Foundation Models

  • Zhenmei Shi (MongoDB + Voyage AI)
Live Stream

Abstract

Deep neural networks excel at learning feature representations, setting them apart from traditional machine learning. This talk explores how feature learning emerges during neural network training and its role in foundation models’ adaptation. We provide theoretical insights into how networks efficiently learn class-relevant patterns early in training, leveraging data structures beyond kernel methods. Extending our analysis to Transformers, we examine Fourier features and the relationship between model scale and in-context learning. Building on these insights, we propose practical improvements, including nuclear norm regularization for domain generalization, a novel contrastive learning regularization, looped Transformers for multi-step gradient descent, and GemFilter for accelerating LLM inference. Our findings enhance understanding and efficiency in modern machine learning systems.

Links

seminar
05.06.25 02.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

Upcoming Events of this Seminar