Data Science & Deep Learning: Discrepancy, Coresets, and Sketches in Machine Learning

Edo Liberty
Monday, 18.11.2019, 12:30
Taub 301 Taub Bld.

This paper defines the notion of class discrepancy for families of functions. It shows that low discrepancy classes admit small offline and streaming coresets. We provide general techniques for bounding the class discrepancy of machine learning problems. As corollaries of the general technique we bound the discrepancy (and therefore coreset complexity) of logistic regression, sigmoid activation loss, matrix covariance, kernel density and any analytic function of the dot product or the squared distance. Our results prove the existence of epsilon-approximation O(sqrt{d}/epsilon) sized coresets for the above problems. This resolves the long-standing open problem regarding the coreset complexity of Gaussian kernel density estimation. We provide two more related but independent results. First, an exponential improvement of the widely used merge-and-reduce trick which gives improved streaming sketches for any low discrepancy problem. Second, an extremely simple deterministic algorithm for finding low discrepancy sequences (and therefore coresets) for any positive semi-definite kernel. This paper establishes some explicit connections between class discrepancy, coreset complexity, learnability, and streaming algorithms.

Short Bio:
Edo is the founder of a stealth-mode company in the machine learning and cloud computing space. Until April 2019 he was a Director of Research at AWS and the manager Amazon AI Labs. The Lab built cutting edge machine learning algorithms, systems, and services for AWS customers. They build parts of SageMaker, Kinesis, QuickSight, Amazon Elastic Search, Glue, Rekognition, DeepRacer, Personalize, Forecast, and other yet-to-be-released services from AWS.

Back to the index of events