Abstract for the talk on 04.09.2020 (11:00 h)Deep Learning Theory Group Seminar
Ido Nachum (École Polytechnique Fédérale de Lausanne)
On the Information Complexity of Learning
04.09.2020, 11:00 h, only video broadcast
Machine learning and information theory tasks are in some sense equivalent since both involve identifying patterns and regularities in data. To recognize an elephant, a child (or a neural network) observes the repeating pattern of big ears, a trunk, and grey skin. To compress a book, a compression algorithm searches for highly repeating letters or words. So the high-level question that guided this research is:
When is learning equivalent to compression?
We use the quantity I(input; output), the mutual information between the training data and the output hypothesis of the learning algorithm, to measure the compression of the algorithm. Under this information-theoretic setting, these two notions are indeed equivalent.
a) Compression implies learning. We will show that learning algorithms that retain a small amount of information from their input sample generalize.
b) Learning implies compression. We will show that under an average-case analysis, every hypothesis class of finite VC dimension (a characterization of learnable classes) has empirical risk minimizers (ERM) that do not reveal a lot of information. If time permits, we will discuss a worst-case lower bound we proved by presenting a simple concept class for which every empirical risk minimizer (also randomized) must reveal a lot of information and also observe connections with well-studied notions such as sample compression schemes, Occam’s razor, PAC-Bayes, and differential privacy.