# Recent Developments Towards a New Theory of Generalisation

*Nihat Ay*

(The seminar has been prepared in collaboration with Juan Pablo Vigneaux.)

### I. Introduction: Classical theory of learning and generalisation

##### A. Statistical learning theory

##### B. Capacity measures in SLT: VC-dimension, Rademacher dimension, etc.

- For A and B, see cf. Bousquet, O., Boucheron, S. and Lugosi, G., 2003, February. Introduction to statistical learning theory. In
*Summer School on Machine Learning* (pp. 169-207). Springer, Berlin, Heidelberg.

##### C. Optimization: Gradient descent and Stochastic Gradient descent

##### D. VC-dimension of neural networks

- Bartlett, P.L. and Maass, W., 2003. Vapnik-Chervonenkis dimension of neural nets.
*The handbook of brain theory and neural networks*, pp.1188-1192.

### II. Puzzles and challenges posed by recent case studies

##### References:

- Zhang, C., Bengio, S., Hardt, M., Recht, B. and Vinyals, O., 2016. Understanding deep learning requires rethinking generalization.
*arXiv preprint arXiv:1611.03530*.
- Gunasekar, S., Woodworth, B.E., Bhojanapalli, S., Neyshabur, B. and Srebro, N., 2017. Implicit regularization in matrix factorization. In
*Advances in Neural Information Processing Systems* (pp. 6151-6159).
- Belkin, M., Hsu, D., Ma, S. and Mandal, S., 2019. Reconciling modern machine-learning practice and the classical bias?variance trade-off.
*Proceedings of the National Academy of Sciences*, 116(32), pp.15849-15854.

##### Complementary references:

- Zhang, C., Liao, Q., Rakhlin, A., Miranda, B., Golowich, N. and Poggio, T., 2018. Theory of deep learning IIb: Optimization properties of SGD.
*arXiv preprint arXiv:1801.02254*.
- Poggio, T., Kawaguchi, K., Liao, Q., Miranda, B., Rosasco, L., Boix, X., Hidary, J. and Mhaskar, H., 2017. Theory of deep learning III: explaining the non-overfitting puzzle.
*arXiv preprint arXiv:1801.00173*.

##### And also the talks by:

### III. Theoretical perspectives and developments

- Bartlett, P.L., 1998. The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network.
*IEEE transactions on Information Theory*, 44(2), pp.525-536.
- Bartlett, P.L., Long, P.M., Lugosi, G. and Tsigler, A., 2019. Benign overfitting in linear regression.
*arXiv preprint arXiv:1906.11300*.
- Gunasekar, S., Lee, J.D., Soudry, D. and Srebro, N., 2018. Implicit bias of gradient descent on linear convolutional networks. In
*Advances in Neural Information Processing Systems* (pp. 9461-9471).

Imprint