Reimaging Gradient Descent: Large Stepsize, Oscillation, and Acceleration

Jingfeng Wu (UC Berkeley)

Live Stream

Abstract

Gradient descent (GD) and its variants are pivotal in machine learning. Conventional wisdom suggests smaller stepsizes for stability, yet in practice, larger stepsizes often yield faster convergence, despite initial instability. In this talk, I will explain how large stepsizes provably accelerate GD for logistic regression on separable data in three settings.

We start with logistic regression, a convex but non-strongly convex problem. We show that GD with a large stepsize attains an $ϵ$ -risk in $O (1 / \sqrt{ϵ})$ steps. This matches the accelerated step complexity of Nesterov momentum and improves the classical $O (1 / ϵ)$ step complexity for GD with a small stepsize.
We then consider $ℓ_{2}$ -regularized logistic regression with regularization strength $λ$ , a strongly convex problem with condition number $Θ (1 / λ)$ . We show GD with a large stepsize attains an $ϵ$ -excess risk in $O (1 / \sqrt{λ} \ln (1 / ϵ))$ steps. This, again, matches Nesterov momentum and improves the $O (1 / λ \ln (1 / ϵ))$ step complexity for GD with a small stepsize.
Finally, we consider the task of finding a linear separator of a linearly separable dataset with margin $γ$ . We show that, with large and adaptive stepsizes, GD solves this task in $1 / γ^{2}$ steps by minimizing the logistic risk. We further show this step complexity is minimax optimal for all first-order methods, and cannot be achieved by GD with small stepsizes.

seminar

03.07.25 02.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

See Details

Upcoming Events of this Seminar

Thursday, 03.07.25 On the Power of Context-Enhanced Learning in LLMs with Xingyu Zhu
Thursday, 10.07.25 The effect of low rank and stochasticity on Gradient Descent at the Edge of Stability with Avrajit Ghosh a.o.
Thursday, 14.08.25 to be announced with Jonathan Siegel
Thursday, 02.10.25 to be announced with Marcello Carioni