Coresets for Integral Probability Metrics

  • Jeff Phillips (University of Utah)
E2 10 (Leon-Lichtenstein)


Integral probability measures (IPMs) are a family of distances between distributions which return the largest deviation under a specified function class. Many of the central problems in data analysis reduce to comparing distributions, and in most cases an IPM is used.

This talk focuses on coresets; these are small and discrete representations of a large data set or a continuous distribution so that the coreset can be used as proxy, and certain measurements are guaranteed to not deviate too far. Interesting coreset questions ask for a given error tolerance, how small can one make the discrete coreset. Such coresets are one of the main tools for scaling machine learning to massive data sets with guaranteed approximation results.

In this talk, we develop coresets for IPMs when the function class is geometrically defined. This has algorithmic applications to many topics including linear classification, kernel density estimates, Kolmogorov-Smirnov distances, and spatial scan statistics. We will conclude with a deeper dive into how to apply these to spatial scan statistics, and how they provide incredibly scalable algorithms without sacrificing statistical power.

Katharina Matschke

MPI for Mathematics in the Sciences Contact via Mail

Upcoming Events of this Seminar