Thinking Fast with Transformers: Algorithmic Reasoning via Shortcuts

Bingbin Liu (Carnegie Mellon University)

Live Stream

Abstract

Algorithmic reasoning requires capabilities which are most naturally understood through recurrent models of computation, like the Turing machine. However, Transformer models, while lacking recurrence, are able to perform such reasoning using far fewer layers than the number of reasoning steps. This raises the question: what solutions are these shallow and non-recurrent models finding? In this talk, we will formalize reasoning in the setting of automata, and show that the computation of an automaton on an input sequence of length T can be replicated exactly by Transformers with o(T) layers, which we call "shortcuts". We provide two constructions, with O(log T) layers for all automata and O(1) layers for solvable automata. Empirically, our results from synthetic experiments show that shallow solutions can also be found in practice.

Links

seminar

14.08.25 02.10.25

Math Machine Learning seminar MPI MIS + UCLA Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

Details anzeigen

Upcoming Events of this Seminar

Donnerstag, 14.08.25 to be announced with Jonathan Siegel
Donnerstag, 02.10.25 to be announced with Marcello Carioni