Visible to the public Randomized Composable Coresets for Matching and Vertex Cover

TitleRandomized Composable Coresets for Matching and Vertex Cover
Publication TypeConference Paper
Year of Publication2017
AuthorsAssadi, Sepehr, Khanna, Sanjeev
Conference NameProceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4593-4
Keywordscomposability, compositionality, Computing Theory, distributed computing, mapreduce algorithms, matching, pubcrawl, randomized composable coresets, simultaneous communication complexity, theoretical cryptography, vertex cover
Abstract

A common approach for designing scalable algorithms for massive data sets is to distribute the computation across, say k, machines and process the data using limited communication between them. A particularly appealing framework here is the simultaneous communication model whereby each machine constructs a small representative summary of its own data and one obtains an approximate/exact solution from the union of the representative summaries. If the representative summaries needed for a problem are small, then this results in a communication-efficient and $\backslash$emph\round-optimal\ (requiring essentially no interaction between the machines) protocol. Some well-known examples of techniques for creating summaries include sampling, linear sketching, and composable coresets. These techniques have been successfully used to design communication efficient solutions for many fundamental graph problems. However, two prominent problems are notably absent from the list of successes, namely, the maximum matching problem and the minimum vertex cover problem. Indeed, it was shown recently that for both these problems, even achieving a modest approximation factor of $\backslash$polylog\(n)\ requires using representative summaries of size $\backslash$widetilde\$\backslash$Omega\(ntextasciicircum2) i.e. essentially no better summary exists than each machine simply sending its entire input graph. The main insight of our work is that the intractability of matching and vertex cover in the simultaneous communication model is inherently connected to an adversarial partitioning of the underlying graph across machines. We show that when the underlying graph is randomly partitioned across machines, both these problems admit $\backslash$emph\randomized composable coresets\ of size $\backslash$widetildeO\(n) that yield an $\backslash$widetildeO\(1)-approximate solution$\backslash$footnote\Here and throughout the paper, we use $\backslash$Ot($\backslash$cdot) notation to suppress $\backslash$polylog\(n)\ factors, where n is the number of vertices in the graph. In other words, a small subgraph of the input graph at each machine can be identified as its representative summary and the final answer then is obtained by simply running any maximum matching or minimum vertex cover algorithm on these combined subgraphs. This results in an O(1)-approximation simultaneous protocol for these problems with O(nk) total communication when the input is randomly partitioned across k machines. We also prove our results are optimal in a very strong sense: we not only rule out existence of smaller randomized composable coresets for these problems but in fact show that our $\backslash$Ot(nk) bound for total communication is optimal for em any simultaneous communication protocol (i.e. not only for randomized coresets) for these two problems. Finally, by a standard application of composable coresets, our results also imply MapReduce algorithms with the same approximation guarantee in one or two rounds of communication, improving the previous best known round complexity for these problems.\vphantom\

URLhttps://dl.acm.org/citation.cfm?doid=3087556.3087581
DOI10.1145/3087556.3087581
Citation Keyassadi_randomized_2017