Visible to the public Biblio

Filters: Keyword is Approximation algorithms  [Clear All Filters]
2020-05-22
Sheth, Utsav, Dutta, Sanghamitra, Chaudhari, Malhar, Jeong, Haewon, Yang, Yaoqing, Kohonen, Jukka, Roos, Teemu, Grover, Pulkit.  2018.  An Application of Storage-Optimal MatDot Codes for Coded Matrix Multiplication: Fast k-Nearest Neighbors Estimation. 2018 IEEE International Conference on Big Data (Big Data). :1113—1120.
We propose a novel application of coded computing to the problem of the nearest neighbor estimation using MatDot Codes (Fahim et al., Allerton'17) that are known to be optimal for matrix multiplication in terms of recovery threshold under storage constraints. In approximate nearest neighbor algorithms, it is common to construct efficient in-memory indexes to improve query response time. One such strategy is Multiple Random Projection Trees (MRPT), which reduces the set of candidate points over which Euclidean distance calculations are performed. However, this may result in a high memory footprint and possibly paging penalties for large or high-dimensional data. Here we propose two techniques to parallelize MRPT that exploit data and model parallelism respectively by dividing both the data storage and the computation efforts among different nodes in a distributed computing cluster. This is especially critical when a single compute node cannot hold the complete dataset in memory. We also propose a novel coded computation strategy based on MatDot codes for the model-parallel architecture that, in a straggler-prone environment, achieves the storage-optimal recovery threshold, i.e., the number of nodes that are required to serve a query. We experimentally demonstrate that, in the absence of straggling, our distributed approaches require less query time than execution on a single processing node, providing near-linear speedups with respect to the number of worker nodes. Our experiments on real systems with simulated straggling, we also show that in a straggler-prone environment, our strategy achieves a faster query execution than the uncoded strategy.
Kang, Hyunjoong, Hong, Sanghyun, Lee, Kookjin, Park, Noseong, Kwon, Soonhyun.  2018.  On Integrating Knowledge Graph Embedding into SPARQL Query Processing. 2018 IEEE International Conference on Web Services (ICWS). :371—374.
SPARQL is a standard query language for knowledge graphs (KGs). However, it is hard to find correct answer if KGs are incomplete or incorrect. Knowledge graph embedding (KGE) enables answering queries on such KGs by inferring unknown knowledge and removing incorrect knowledge. Hence, our long-term goal in this line of research is to propose a new framework that integrates KGE and SPARQL, which opens various research problems to be addressed. In this paper, we solve one of the most critical problems, that is, optimizing the performance of nearest neighbor (NN) search. In our evaluations, we demonstrate that the search time of state-of-the-art NN search algorithms is improved by 40% without sacrificing answer accuracy.
2020-03-30
Miao, Hui, Deshpande, Amol.  2019.  Understanding Data Science Lifecycle Provenance via Graph Segmentation and Summarization. 2019 IEEE 35th International Conference on Data Engineering (ICDE). :1710–1713.
Increasingly modern data science platforms today have non-intrusive and extensible provenance ingestion mechanisms to collect rich provenance and context information, handle modifications to the same file using distinguishable versions, and use graph data models (e.g., property graphs) and query languages (e.g., Cypher) to represent and manipulate the stored provenance/context information. Due to the schema-later nature of the metadata, multiple versions of the same files, and unfamiliar artifacts introduced by team members, the resulting "provenance graphs" are quite verbose and evolving; further, it is very difficult for the users to compose queries and utilize this valuable information just using standard graph query model. In this paper, we propose two high-level graph query operators to address the verboseness and evolving nature of such provenance graphs. First, we introduce a graph segmentation operator, which queries the retrospective provenance between a set of source vertices and a set of destination vertices via flexible boundary criteria to help users get insight about the derivation relationships among those vertices. We show the semantics of such a query in terms of a context-free grammar, and develop efficient algorithms that run orders of magnitude faster than state-of-the-art. Second, we propose a graph summarization operator that combines similar segments together to query prospective provenance of the underlying project. The operator allows tuning the summary by ignoring vertex details and characterizing local structures, and ensures the provenance meaning using path constraints. We show the optimal summary problem is PSPACE-complete and develop effective approximation algorithms. We implement the operators on top of Neo4j, evaluate our query techniques extensively, and show the effectiveness and efficiency of the proposed methods.
2020-02-10
Chen, Siyuan, Liu, Wei, Liu, Jiamou, Soo, Khí-Uí, Chen, Wu.  2019.  Maximizing Social Welfare in Fractional Hedonic Games using Shapley Value. 2019 IEEE International Conference on Agents (ICA). :21–26.
Fractional hedonic games (FHGs) are extensively studied in game theory and explain the formation of coalitions among individuals in a group. This paper investigates the coalition generation problem, namely, finding a coalition structure whose social welfare, i.e., the sum of the players' payoffs, is maximized. We focus on agent-based methods which set the decision rules for each player in the game. Through repeated interactions the players arrive at a coalition structure. In particular, we propose CFSV, namely, coalition formation with Shapley value-based welfare distribution scheme. To evaluate CFSV, we theoretically demonstrate that this algorithm achieves optimal coalition structure over certain standard graph classes and empirically compare the algorithm against other existing benchmarks on real-world and synthetic graphs. The results show that CFSV is able to achieve superior performance.
2019-06-10
Debatty, T., Mees, W., Gilon, T..  2018.  Graph-Based APT Detection. 2018 International Conference on Military Communications and Information Systems (ICMCIS). :1-8.

In this paper we propose a new algorithm to detect Advanced Persistent Threats (APT's) that relies on a graph model of HTTP traffic. We also implement a complete detection system with a web interface that allows to interactively analyze the data. We perform a complete parameter study and experimental evaluation using data collected on a real network. The results show that the performance of our system is comparable to currently available antiviruses, although antiviruses use signatures to detect known malwares while our algorithm solely uses behavior analysis to detect new undocumented attacks.

2019-04-05
Nan, Z., Zhai, L., Zhai, L., Liu, H..  2018.  Botnet Homology Method Based on Symbolic Approximation Algorithm of Communication Characteristic Curve. 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). :1-6.

The IRC botnet is the earliest and most significant botnet group that has a significant impact. Its characteristic is to control multiple zombies hosts through the IRC protocol and constructing command control channels. Relevant research analyzes the large amount of network traffic generated by command interaction between the botnet client and the C&C server. Packet capture traffic monitoring on the network is currently a more effective detection method, but this information does not reflect the essential characteristics of the IRC botnet. The increase in the amount of erroneous judgments has often occurred. To identify whether the botnet control server is a homogenous botnet, dynamic network communication characteristic curves are extracted. For unequal time series, dynamic time warping distance clustering is used to identify the homologous botnets by category, and in order to improve detection. Speed, experiments will use SAX to reduce the dimension of the extracted curve, reducing the time cost without reducing the accuracy.

2019-01-21
Fahrbach, M., Miller, G. L., Peng, R., Sawlani, S., Wang, J., Xu, S. C..  2018.  Graph Sketching against Adaptive Adversaries Applied to the Minimum Degree Algorithm. 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS). :101–112.

Motivated by the study of matrix elimination orderings in combinatorial scientific computing, we utilize graph sketching and local sampling to give a data structure that provides access to approximate fill degrees of a matrix undergoing elimination in polylogarithmic time per elimination and query. We then study the problem of using this data structure in the minimum degree algorithm, which is a widely-used heuristic for producing elimination orderings for sparse matrices by repeatedly eliminating the vertex with (approximate) minimum fill degree. This leads to a nearly-linear time algorithm for generating approximate greedy minimum degree orderings. Despite extensive studies of algorithms for elimination orderings in combinatorial scientific computing, our result is the first rigorous incorporation of randomized tools in this setting, as well as the first nearly-linear time algorithm for producing elimination orderings with provable approximation guarantees. While our sketching data structure readily works in the oblivious adversary model, by repeatedly querying and greedily updating itself, it enters the adaptive adversarial model where the underlying sketches become prone to failure due to dependency issues with their internal randomness. We show how to use an additional sampling procedure to circumvent this problem and to create an independent access sequence. Our technique for decorrelating interleaved queries and updates to this randomized data structure may be of independent interest.

2019-01-16
Rodríguez, R. J., Martín-Pérez, M., Abadía, I..  2018.  A tool to compute approximation matching between windows processes. 2018 6th International Symposium on Digital Forensic and Security (ISDFS). :1–6.
Finding identical digital objects (or artifacts) during a forensic analysis is commonly achieved by means of cryptographic hashing functions, such as MD5, SHA1, or SHA-256, to name a few. However, these functions suffer from the avalanche effect property, which guarantees that if an input is changed slightly the output changes significantly. Hence, these functions are unsuitable for typical digital forensics scenarios where a forensics memory image from a likely compromised machine shall be analyzed. This memory image file contains a snapshot of processes (instances of executable files) which were up on execution when the dumping process was done. However, processes are relocated at memory and contain dynamic data that depend on the current execution and environmental conditions. Therefore, the comparison of cryptographic hash values of different processes from the same executable file will be negative. Bytewise approximation matching algorithms may help in these scenarios, since they provide a similarity measurement in the range [0,1] between similar inputs instead of a yes/no answer (in the range 0,1). In this paper, we introduce ProcessFuzzyHash, a Volatility plugin that enables us to compute approximation hash values of processes contained in a Windows memory dump.
2018-03-26
Naor, Assaf, Young, Robert.  2017.  The Integrality Gap of the Goemans-Linial SDP Relaxation for Sparsest Cut Is at Least a Constant Multiple of $\surd$Log N. Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing. :564–575.

We prove that the integrality gap of the Goemans–Linial semidefinite programming relaxation for the Sparsest Cut Problem is Î\textcopyright(√logn) on inputs with n vertices, thus matching the previously best known upper bound (logn)1/2+o(1) up to lower-order factors. This statement is a consequence of the following new isoperimetric-type inequality. Consider the 8-regular graph whose vertex set is the 5-dimensional integer grid â„\textcurrency5 and where each vertex (a,b,c,d,e)∈ â„\textcurrency5 is connected to the 8 vertices (aÂ$\pm$ 1,b,c,d,e), (a,bÂ$\pm$ 1,c,d,e), (a,b,cÂ$\pm$ 1,d,eÂ$\pm$ a), (a,b,c,dÂ$\pm$ 1,eÂ$\pm$ b). This graph is known as the Cayley graph of the 5-dimensional discrete Heisenberg group. Given Î\textcopyright⊂ â„\textcurrency5, denote the size of its edge boundary in this graph (a.k.a. the horizontal perimeter of Î\textcopyright) by textbar∂hÎ\textcopyrighttextbar. For t∈ ℕ, denote by textbar∂vtÎ\textcopyrighttextbar the number of (a,b,c,d,e)∈ â„\textcurrency5 such that exactly one of the two vectors (a,b,c,d,e),(a,b,c,d,e+t) is in Î\textcopyright. The vertical perimeter of Î\textcopyright is defined to be textbar∂vÎ\textcopyrighttextbar= √∑t=1∞textbar∂vtÎ\textcopyrighttextbar2/t2. We show that every subset Î\textcopyright⊂ â„\textcurrency5 satisfies textbar∂vÎ\textcopyrighttextbar=O(textbar∂hÎ\textcopyrighttextbar). This vertical-versus-horizontal isoperimetric inequality yields the above-stated integrality gap for Sparsest Cut and answers several geometric and analytic questions of independent interest. The theorem stated above is the culmination of a program whose aim is to understand the performance of the Goemans–Linial semidefinite program through the embeddability properties of Heisenberg groups. These investigations have mathematical significance even beyond their established relevance to approximation algorithms and combinatorial optimization. In particular they contribute to a range of mathematical disciplines including functional analysis, geometric group theory, harmonic analysis, sub-Riemannian geometry, geometric measure theory, ergodic theory, group representations, and metric differentiation. This article builds on the above cited works, with the “twist” that while those works were equally valid for any finite dimensional Heisenberg group, our result holds for the Heisenberg group of dimension 5 (or higher) but fails for the 3-dimensional Heisenberg group. This insight leads to our core contribution, which is a deduction of an endpoint L1-boundedness of a certain singular integral on ℝ5 from the (local) L2-boundedness of the corresponding singular integral on ℝ3. To do this, we devise a corona-type decomposition of subsets of a Heisenberg group, in the spirit of the construction that David and Semmes performed in ℝn, but with two main conceptual differences (in addition to more technical differences that arise from the peculiarities of the geometry of Heisenberg group). Firstly, the“atoms” of our decomposition are perturbations of intrinsic Lipschitz graphs in the sense of Franchi, Serapioni, and Serra Cassano (plus the requisite “wild” regions that satisfy a Carleson packing condition). Secondly, we control the local overlap of our corona decomposition by using quantitative monotonicity rather than Jones-type Î$^2$-numbers.

2018-01-16
Arita, S., Kozaki, S..  2017.  A Homomorphic Signature Scheme for Quadratic Polynomials. 2017 IEEE International Conference on Smart Computing (SMARTCOMP). :1–6.

Homomorphic signatures can provide a credential of a result which is indeed computed with a given function on a data set by an untrusted third party like a cloud server, when the input data are stored with the signatures beforehand. Boneh and Freeman in EUROCRYPT2011 proposed a homomorphic signature scheme for polynomial functions of any degree, however the scheme is not based on the normal short integer solution (SIS) problems as its security assumption. In this paper, we show a homomorphic signature scheme for quadratic polynomial functions those security assumption is based on the normal SIS problems. Our scheme constructs the signatures of multiplication as tensor products of the original signature vectors of input data so that homomorphism holds. Moreover, security of our scheme is reduced to the hardness of the SIS problems respect to the moduli such that one modulus is the power of the other modulus. We show the reduction by constructing solvers of the SIS problems respect to either of the moduli from any forger of our scheme.

2017-09-15
Dhulipala, Laxman, Kabiljo, Igor, Karrer, Brian, Ottaviano, Giuseppe, Pupyrev, Sergey, Shalita, Alon.  2016.  Compressing Graphs and Indexes with Recursive Graph Bisection. Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. :1535–1544.

Graph reordering is a powerful technique to increase the locality of the representations of graphs, which can be helpful in several applications. We study how the technique can be used to improve compression of graphs and inverted indexes. We extend the recent theoretical model of Chierichetti et al. (KDD 2009) for graph compression, and show how it can be employed for compression-friendly reordering of social networks and web graphs and for assigning document identifiers in inverted indexes. We design and implement a novel theoretically sound reordering algorithm that is based on recursive graph bisection. Our experiments show a significant improvement of the compression rate of graph and indexes over existing heuristics. The new method is relatively simple and allows efficient parallel and distributed implementations, which is demonstrated on graphs with billions of vertices and hundreds of billions of edges.

2017-06-05
Korupolu, Madhukar, Rajaraman, Rajmohan.  2016.  Robust and Probabilistic Failure-Aware Placement. Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures. :213–224.

Motivated by the growing complexity and heterogeneity of modern data centers, and the prevalence of commodity component failures, this paper studies the failure-aware placement problem of placing tasks of a parallel job on machines in the data center with the goal of increasing availability. We consider two models of failures: adversarial and probabilistic. In the adversarial model, each node has a weight (higher weight implying higher reliability) and the adversary can remove any subset of nodes of total weight at most a given bound W and our goal is to find a placement that incurs the least disruption against such an adversary. In the probabilistic model, each node has a probability of failure and we need to find a placement that maximizes the probability that at least K out of N tasks survive at any time. For adversarial failures, we first show that (i) the problems are in Σ2, the second level of the polynomial hierarchy, (ii) a basic variant, that we call RobustFAP, is co-NP-hard, and (iii) an all-or-nothing version of RobustFAP is Σ2-complete. We then give a PTAS for RobustFAP, a key ingredient of which is a solution that we design for a fractional version of RobustFAP. We then study fractional RobustFAP over hierarchies, denoted HierRobustFAP, and introduce a notion of hierarchical max-min fairness/ and a novel Generalized Spreading/ algorithm which is simultaneously optimal for all W. These generalize the classical notion of max-min fairness to work with nodes of differing capacities, differing reliability weights and hierarchical structures. Using randomized rounding, we extend this to give an algorithm for integral HierRobustFAP. For the probabilistic version, we first give an algorithm that achieves an additive ε approximation in the failure probability for the single level version, called ProbFAP, while giving up a (1 + ε) multiplicative factor in the number of failures. We then extend the result to the hierarchical version, HierProbFAP, achieving an ε additive approximation in failure probability while giving up an (L + ε) multiplicative factor in the number of failures, where \$L\$ is the number of levels in the hierarchy.

2017-03-08
Degenbaeva, C., Klusch, M..  2015.  Critical Node Detection Problem Solving on GPU and in the Cloud. 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded S. :52–57.

The Critical Node Detection Problem (CNDP) is a well-known NP-complete, graph-theoretical problem with many real-world applications in various fields such as social network analysis, supply-chain network analysis, transport engineering, network immunization, and military strategic planning. We present the first parallel algorithms for CNDP solving in general, and for fast, approximated CND on GPU and in the cloud in particular. Finally, we discuss results of our experimental performance analysis of these solutions.

Mao, Y., Yang, J., Zhu, B., Yang, Y..  2015.  A new mesh simplification algorithm based on quadric error metric. 2015 IEEE 5th International Conference on Consumer Electronics - Berlin (ICCE-Berlin). :463–466.

This paper proposes an improved mesh simplification algorithm based on quadric error metrics (QEM) to efficiently processing the huge data in 3D image processing. This method fully uses geometric information around vertices to avoid model edge from being simplified and to keep details. Meanwhile, the differences between simplified triangular meshes and equilateral triangles are added as weights of errors to decrease the possibilities of narrow triangle and then to avoid the visual mutation. Experiments show that our algorithm has obvious advantages over the time cost, and can better save the visual characteristics of model, which is suitable for solving most image processing, that is, "Real-time interactive" problem.

2017-02-21
S. R. Islam, S. P. Maity, A. K. Ray.  2015.  "On compressed sensing image reconstruction using linear prediction in adaptive filtering". 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI). :2317-2323.

Compressed sensing (CS) or compressive sampling deals with reconstruction of signals from limited observations/ measurements far below the Nyquist rate requirement. This is essential in many practical imaging system as sampling at Nyquist rate may not always be possible due to limited storage facility, slow sampling rate or the measurements are extremely expensive e.g. magnetic resonance imaging (MRI). Mathematically, CS addresses the problem for finding out the root of an unknown distribution comprises of unknown as well as known observations. Robbins-Monro (RM) stochastic approximation, a non-parametric approach, is explored here as a solution to CS reconstruction problem. A distance based linear prediction using the observed measurements is done to obtain the unobserved samples followed by random noise addition to act as residual (prediction error). A spatial domain adaptive Wiener filter is then used to diminish the noise and to reveal the new features from the degraded observations. Extensive simulation results highlight the relative performance gain over the existing work.