Visible to the public Distributed Balanced Partitioning via Linear Embedding

TitleDistributed Balanced Partitioning via Linear Embedding
Publication TypeConference Paper
Year of Publication2016
AuthorsAydin, Kevin, Bateni, MohammadHossein, Mirrokni, Vahab
Conference NameProceedings of the Ninth ACM International Conference on Web Search and Data Mining
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-3716-8
Keywordscomposability, compositionality, Computing Theory, Computing Theory and Composabilty, cut minimization, embedding to line, imbalance, local improvement, MapReduce, maps, partitioning, pubcrawl, social networks
Abstract

Balanced partitioning is often a crucial first step in solving large-scale graph optimization problems: in some cases, a big graph is chopped into pieces that fit on one machine to be processed independently before stitching the results together, leading to certain suboptimality from the interaction among different pieces. In other cases, links between different parts may show up in the running time and/or network communications cost, hence the desire to have small cut size. We study a distributed balanced partitioning problem where the goal is to partition the vertices of a given graph into k pieces, minimizing the total cut size. Our algorithm is composed of a few steps that are easily implementable in distributed computation frameworks, e.g., MapReduce. The algorithm first embeds nodes of the graph onto a line, and then processes nodes in a distributed manner guided by the linear embedding order. We examine various ways to find the first embedding, e.g., via a hierarchical clustering or Hilbert curves. Then we apply four different techniques such as local swaps, minimum cuts on partition boundaries, as well as contraction and dynamic programming. Our empirical study compares the above techniques with each other, and to previous work in distributed algorithms, e.g., a label propagation method, FENNEL and Spinner. We report our results both on a private map graph and several public social networks, and show that our results beat previous distributed algorithms: we notice, e.g., 15-25% reduction in cut size over [UB13]. We also observe that our algorithms allow for scalable distributed implementation for any number of partitions. Finally, we apply our techniques for the Google Maps Driving Directions to minimize the number of multi-shard queries with the goal of saving in CPU usage. During live experiments, we observe an 40% drop in the number of multi-shard queries when comparing our method with a standard geography-based method.

URLhttp://doi.acm.org/10.1145/2835776.2835829
DOI10.1145/2835776.2835829
Citation Keyaydin_distributed_2016