Outline of the seminar, together with some key references (Back to course overview)

Recent Developments Towards a New Theory of Generalisation

Nihat Ay
(The seminar has been prepared in collaboration with Juan Pablo Vigneaux.)

I. Introduction: Classical theory of learning and generalisation

A. Statistical learning theory
B. Capacity measures in SLT: VC-dimension, Rademacher dimension, etc.
  • For A and B, see cf. Bousquet, O., Boucheron, S. and Lugosi, G., 2003, February. Introduction to statistical learning theory. In Summer School on Machine Learning (pp. 169-207). Springer, Berlin, Heidelberg.
C. Optimization: Gradient descent and Stochastic Gradient descent
D. VC-dimension of neural networks
  • Bartlett, P.L. and Maass, W., 2003. Vapnik-Chervonenkis dimension of neural nets. The handbook of brain theory and neural networks, pp.1188-1192.

II. Puzzles and challenges posed by recent case studies

  • Zhang, C., Bengio, S., Hardt, M., Recht, B. and Vinyals, O., 2016. Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530.
  • Gunasekar, S., Woodworth, B.E., Bhojanapalli, S., Neyshabur, B. and Srebro, N., 2017. Implicit regularization in matrix factorization. In Advances in Neural Information Processing Systems (pp. 6151-6159).
  • Belkin, M., Hsu, D., Ma, S. and Mandal, S., 2019. Reconciling modern machine-learning practice and the classical bias?variance trade-off. Proceedings of the National Academy of Sciences, 116(32), pp.15849-15854.
Complementary references:
  • Zhang, C., Liao, Q., Rakhlin, A., Miranda, B., Golowich, N. and Poggio, T., 2018. Theory of deep learning IIb: Optimization properties of SGD. arXiv preprint arXiv:1801.02254.
  • Poggio, T., Kawaguchi, K., Liao, Q., Miranda, B., Rosasco, L., Boix, X., Hidary, J. and Mhaskar, H., 2017. Theory of deep learning III: explaining the non-overfitting puzzle. arXiv preprint arXiv:1801.00173.
And also the talks by:

III. Theoretical perspectives and developments

  • Bartlett, P.L., 1998. The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE transactions on Information Theory, 44(2), pp.525-536.
  • Bartlett, P.L., Long, P.M., Lugosi, G. and Tsigler, A., 2019. Benign overfitting in linear regression. arXiv preprint arXiv:1906.11300.
  • Gunasekar, S., Lee, J.D., Soudry, D. and Srebro, N., 2018. Implicit bias of gradient descent on linear convolutional networks. In Advances in Neural Information Processing Systems (pp. 9461-9471).