Visible to the public Biblio

Filters: Keyword is vertex cover  [Clear All Filters]
2018-04-11
Assadi, Sepehr, Khanna, Sanjeev.  2017.  Randomized Composable Coresets for Matching and Vertex Cover. Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures. :3–12.

A common approach for designing scalable algorithms for massive data sets is to distribute the computation across, say k, machines and process the data using limited communication between them. A particularly appealing framework here is the simultaneous communication model whereby each machine constructs a small representative summary of its own data and one obtains an approximate/exact solution from the union of the representative summaries. If the representative summaries needed for a problem are small, then this results in a communication-efficient and $\backslash$emph\round-optimal\ (requiring essentially no interaction between the machines) protocol. Some well-known examples of techniques for creating summaries include sampling, linear sketching, and composable coresets. These techniques have been successfully used to design communication efficient solutions for many fundamental graph problems. However, two prominent problems are notably absent from the list of successes, namely, the maximum matching problem and the minimum vertex cover problem. Indeed, it was shown recently that for both these problems, even achieving a modest approximation factor of $\backslash$polylog\(n)\ requires using representative summaries of size $\backslash$widetilde\$\backslash$Omega\(ntextasciicircum2) i.e. essentially no better summary exists than each machine simply sending its entire input graph. The main insight of our work is that the intractability of matching and vertex cover in the simultaneous communication model is inherently connected to an adversarial partitioning of the underlying graph across machines. We show that when the underlying graph is randomly partitioned across machines, both these problems admit $\backslash$emph\randomized composable coresets\ of size $\backslash$widetildeØ\(n) that yield an $\backslash$widetildeØ\(1)-approximate solution$\backslash$footnote\Here and throughout the paper, we use $\backslash$Ot($\backslash$cdot) notation to suppress $\backslash$polylog\(n)\ factors, where n is the number of vertices in the graph. In other words, a small subgraph of the input graph at each machine can be identified as its representative summary and the final answer then is obtained by simply running any maximum matching or minimum vertex cover algorithm on these combined subgraphs. This results in an Õ(1)-approximation simultaneous protocol for these problems with Õ(nk) total communication when the input is randomly partitioned across k machines. We also prove our results are optimal in a very strong sense: we not only rule out existence of smaller randomized composable coresets for these problems but in fact show that our $\backslash$Ot(nk) bound for total communication is optimal for em any simultaneous communication protocol (i.e. not only for randomized coresets) for these two problems. Finally, by a standard application of composable coresets, our results also imply MapReduce algorithms with the same approximation guarantee in one or two rounds of communication, improving the previous best known round complexity for these problems.\vphantom\

2017-03-07
Koehler, Henning, Link, Sebastian.  2016.  Qualitative Cleaning of Uncertain Data. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. :2269–2274.

We propose a new view on data cleaning: Not data itself but the degrees of uncertainty attributed to data are dirty. Applying possibility theory, tuples are assigned degrees of possibility with which they occur, and constraints are assigned degrees of certainty that say to which tuples they apply. Classical data cleaning modifies some minimal set of tuples. Instead, we marginally reduce their degrees of possibility. This reduction leads to a new qualitative version of the vertex cover problem. Qualitative vertex cover can be mapped to a linear-weighted constraint satisfaction problem. However, any off-the-shelf solver cannot solve the problem more efficiently than classical vertex cover. Instead, we utilize the degrees of possibility and certainty to develop a dedicated algorithm that is fixed parameter tractable in the size of the qualitative vertex cover. Experiments show that our algorithm is faster than solvers for the classical vertex cover problem by several orders of magnitude, and performance improves with higher numbers of uncertainty degrees.