Understanding Training and Adaptation in Feature Learning: From Two-Layer Networks to Foundation Models
- Zhenmei Shi (MongoDB + Voyage AI)
Abstract
Deep neural networks excel at learning feature representations, setting them apart from traditional machine learning. This talk explores how feature learning emerges during neural network training and its role in foundation models’ adaptation. We provide theoretical insights into how networks efficiently learn class-relevant patterns early in training, leveraging data structures beyond kernel methods. Extending our analysis to Transformers, we examine Fourier features and the relationship between model scale and in-context learning. Building on these insights, we propose practical improvements, including nuclear norm regularization for domain generalization, a novel contrastive learning regularization, looped Transformers for multi-step gradient descent, and GemFilter for accelerating LLM inference. Our findings enhance understanding and efficiency in modern machine learning systems.