Geometric Methods for Analyzing and Monitoring Large Distributed Data

Guy Sagy, Ph.D. Thesis Seminar
Wednesday, 8.2.2012, 13:00
Taub 601
Prof. Assaf Schuster and Prof. Daniel Keren

A basic requirement in many distributed systems is the ability to detect objects whose score, according to a given function, exceeds some threshold. Since an object's data can be partitioned over various nodes, computing its global score requires collecting its data over the network. A main challenge is to perform threshold queries or monitoring with minimum network communication, i.e., without collecting the data from the nodes to a central location. In this talk I will present an application of geometric ideas for performing threshold queries and top-k queries over distributed databases with minimum communication. The proposed method can handle general functions by representing them as a difference of monotonic functions, and imposing thresholds on the latter. In addition, I will present a novel scheme for communication reduction in distributed monitoring using local constraints. Communication is required only in the event that the local constraints are violated by the incoming data. As opposed to previous work on geometric monitoring, here the local constraints are tailored to fit the local data distribution at each stream, and satisfy an optimality criterion. The result is a substantial decrease in the required volume of communication compared to previous state of the art, up to two orders of magnitude in experiments with real-life data. Theoretical analysis suggests that the reduction can sometimes be unbounded, and that it typically improves with the dimensionality of the data, which is borne out in the experiments.

Back to the index of events