Gradients and Groups: How Neural Networks Learn Features in Algebraic Tasks
- Daniel Kunin (Berkeley)
Abstract
Understanding how neural networks learn structured representations from data remains a central challenge in deep learning theory. I will present recent work introducing Alternating Gradient Flows (AGF), an analytic framework for understanding feature learning in two-layer networks. In small-initialization regimes, gradient flow exhibits a characteristic staircase dynamic: neurons first align with useful directions and then rapidly grow in norm. AGF models this process as alternating maximization and minimization phases, unifying prior saddle-to-saddle analyses and provably matching gradient flow in diagonal linear networks. Applied to quadratic networks trained on modular addition, this framework yields a complete characterization of training dynamics, showing that Fourier features emerge sequentially in decreasing order of importance.
I will then discuss recent work extending this analysis to sequential group composition, where networks must compose elements of finite groups presented as sequences. This task provably requires nonlinear architectures, admits tractable feature learning, and reveals an interpretable benefit of depth. Using AGF, we show that two-layer networks acquire group structure one irreducible representation at a time but require width exponential in the sequence length, whereas deep networks can identify efficient solutions by exploiting associativity to compose intermediate representations. Together, these results provide a tractable bridge between optimization dynamics, representation theory, and sequence learning in neural networks.
This talk is based on the following papers: