Learning Dynamics of Pretraining and Finetuning for Linear Models

Ziqing Xu (University of Pennsylvania)

Live Stream

Abstract

In this talk, we study the learning dynamics and convergence rates of gradient-based methods for two-layer linear models. We first study the dynamics of gradient descent during pretraining and show that the optimization problem satisfies a local PL condition and a local Descent Lemma, which lead to a linear convergence rate for a suitable choice of the step sizes. Compared to prior work, our results require no restrictive assumptions on width, initialization, or step sizes, and achieve faster convergence rates. Next, we examine the finetuning stage through the lens of low-rank adaptation for matrix factorization. We show that gradient flow converges to a neighborhood of the optimal solution and that smaller initializations yield lower final errors. Our analysis reveals how the final error depends on the misalignment between the singular spaces of the pretrained model and the target matrix, and it highlights how reducing the initialization scale can improve alignment and hence performance.

Links

seminar

14.08.25 02.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

See Details

Upcoming Events of this Seminar

Thursday, 14.08.25 to be announced with Jonathan Siegel
Thursday, 02.10.25 to be announced with Marcello Carioni