Talk
ReLU transformers and piecewise polynomials
- Zehua Lai (UT Austin)
Abstract
We highlight a perhaps important but hitherto unobserved insight: The attention module in a ReLU-transformer is a cubic spline. Viewed in this manner, this mysterious but critical component of a transformer becomes a natural development of an old notion deeply entrenched in classical approximation theory. Conversely, if we assume the Pierce--Birkhoff conjecture, then every spline is also an encoder. This gives a satisfying answer to the mathematical structure of ReLU-transformers.