Search

Talk

A review of LLMs from the perspective of memory and compute

  • Yoni Dukler (AWS AI)
Live Stream

Abstract

Scaling of machine learning models has significantly improved model capabilities, especially for self-supervised tasks. To scale efficiently, one must harmonize modeling efforts with good utilization of the computational resources at hand. In this talk I will review a selection of important LLM architectural choices with the perspective of efficiency. I will first walk through the high level mechanism of GPU computation along with their capabilities and constraints. I will dive deeper into a few aspects of the hardware and work to identify a few basic principles that guide hardware efficiency in the field. From there, I will motivate recent innovations in LLM execution and architecture from the lens of the hardware principles we identified, considering both LLM training and inference efficiency.

seminar
05.12.24 19.12.24

Math Machine Learning seminar MPI MIS + UCLA

MPI for Mathematics in the Sciences Live Stream

Katharina Matschke

MPI for Mathematics in the Sciences Contact via Mail

Upcoming Events of this Seminar