Abstract for the talk on 22.10.2020 (17:00 h)Math Machine Learning seminar MPI MIS + UCLA
Boris Hanin (Princeton University)
Neural Networks at Finite Width and Large Depth
See the video of this talk.
Deep neural networks are often considered to be complicated "black boxes," for which a full systematic analysis is not only out of reach but also impossible. In this talk, which is based on ongoing joint work with Sho Yaida and Daniel Adam Roberts, I will make the opposite claim. Namely, that deep neural networks with random weights and biases are perturbatively solvable models. Our approach applies to networks at finite width n and large depth L, the regime in which they are used in practice. A key point will be the emergence of a notion of "criticality," which involves a finetuning of model parameters (weight and bias variances). At criticality, neural networks are particularly well-behaved but still exhibit a tension between large values for n and L, with larger values of n tending to make neural networks more like Gaussian processes and large values of L amplifying higher cumulants. Our analysis at initialization has many consequences also for networks during after training, which I will discuss if time permits.