Distributed Balanced Partitioning via Linear Embedding

Submitted by K_Hooper on Fri, 10/13/2017 - 10:29am

Title	Distributed Balanced Partitioning via Linear Embedding
Publication Type	Conference Paper
Year of Publication	2016
Authors	Aydin, Kevin, Bateni, MohammadHossein, Mirrokni, Vahab
Conference Name	Proceedings of the Ninth ACM International Conference on Web Search and Data Mining
Publisher	ACM
Conference Location	New York, NY, USA
ISBN Number	978-1-4503-3716-8
Keywords	composability, compositionality, Computing Theory, Computing Theory and Composabilty, cut minimization, embedding to line, imbalance, local improvement, MapReduce, maps, partitioning, pubcrawl, social networks
Abstract	Balanced partitioning is often a crucial first step in solving large-scale graph optimization problems: in some cases, a big graph is chopped into pieces that fit on one machine to be processed independently before stitching the results together, leading to certain suboptimality from the interaction among different pieces. In other cases, links between different parts may show up in the running time and/or network communications cost, hence the desire to have small cut size. We study a distributed balanced partitioning problem where the goal is to partition the vertices of a given graph into k pieces, minimizing the total cut size. Our algorithm is composed of a few steps that are easily implementable in distributed computation frameworks, e.g., MapReduce. The algorithm first embeds nodes of the graph onto a line, and then processes nodes in a distributed manner guided by the linear embedding order. We examine various ways to find the first embedding, e.g., via a hierarchical clustering or Hilbert curves. Then we apply four different techniques such as local swaps, minimum cuts on partition boundaries, as well as contraction and dynamic programming. Our empirical study compares the above techniques with each other, and to previous work in distributed algorithms, e.g., a label propagation method, FENNEL and Spinner. We report our results both on a private map graph and several public social networks, and show that our results beat previous distributed algorithms: we notice, e.g., 15-25% reduction in cut size over [UB13]. We also observe that our algorithms allow for scalable distributed implementation for any number of partitions. Finally, we apply our techniques for the Google Maps Driving Directions to minimize the number of multi-shard queries with the goal of saving in CPU usage. During live experiments, we observe an 40% drop in the number of multi-shard queries when comparing our method with a standard geography-based method.
URL	http://doi.acm.org/10.1145/2835776.2835829
DOI	10.1145/2835776.2835829
Citation Key	aydin_distributed_2016

Groups:

Science of Security VO