Biblio
In this paper, the correctness of the routing algorithm for the distributed key-value store based on order preserving linear hashing and Skip Graph is proved. In this system, data are divided by linear hashing and Skip Graph is used for overlay network. The routing table of this system is very uniform. Then, short detours can exist in the route of forwarding. By using these detours, the number of hops for the query forwarding is reduced.
The serializability of transactions is the most important property that ensure correct processing to transactions. In case of concurrent access to the same data by several transactions, or in case of dependency relationships between running sub transactions. But some transactions has been marked as malicious and they compromise the serialization of running system. For that purpose, we propose an intrusion tolerant scheme to ensure the continuity of the running transactions. A transaction dependency graph is also used by the CDC to make decisions concerning the set of data and transactions that are threatened by a malicious activity. We will give explanations about how to use the proposed scheme to illustrate its behavior and efficiency against a compromised transaction-based in a cloud of databases environment. Several issues should be considered when dealing with the processing of a set of interleaved transactions in a transaction based environment. In most cases, these issues are due to the concurrent access to the same data by several transactions or the dependency relationship between running transactions. The serializability may be affected if a transaction that belongs to the processing node is compromised.
This paper proposes a context-aware, graph-based approach for identifying anomalous user activities via user profile analysis, which obtains a group of users maximally similar among themselves as well as to the query during test time. The main challenges for the anomaly detection task are: (1) rare occurrences of anomalies making it difficult for exhaustive identification with reasonable false-alarm rate, and (2) continuously evolving new context-dependent anomaly types making it difficult to synthesize the activities apriori. Our proposed query-adaptive graph-based optimization approach, solvable using maximum flow algorithm, is designed to fully utilize both mutual similarities among the user models and their respective similarities with the query to shortlist the user profiles for a more reliable aggregated detection. Each user activity is represented using inputs from several multi-modal resources, which helps to localize anomalies from time-dependent data efficiently. Experiments on public datasets of insider threats and gesture recognition show impressive results.
As increasingly more enterprises are deploying cloud file-sharing services, this adds a new channel for potential insider threats to company data and IPs. In this paper, we introduce a two-stage machine learning system to detect anomalies. In the first stage, we project the access logs of cloud file-sharing services onto relationship graphs and use three complementary graph-based unsupervised learning methods: OddBall, PageRank and Local Outlier Factor (LOF) to generate outlier indicators. In the second stage, we ensemble the outlier indicators and introduce the discrete wavelet transform (DWT) method, and propose a procedure to use wavelet coefficients with the Haar wavelet function to identify outliers for insider threat. The proposed system has been deployed in a real business environment, and demonstrated effectiveness by selected case studies.
While most organizations continue to invest in traditional network defences, a formidable security challenge has been brewing within their own boundaries. Malicious insiders with privileged access in the guise of a trusted source have carried out many attacks causing far reaching damage to financial stability, national security and brand reputation for both public and private sector organizations. Growing exposure and impact of the whistleblower community and concerns about job security with changing organizational dynamics has further aggravated this situation. The unpredictability of malicious attackers, as well as the complexity of malicious actions, necessitates the careful analysis of network, system and user parameters correlated with insider threat problem. Thus it creates a high dimensional, heterogeneous data analysis problem in isolating suspicious users. This research work proposes an insider threat detection framework, which utilizes the attributed graph clustering techniques and outlier ranking mechanism for enterprise users. Empirical results also confirm the effectiveness of the method by achieving the best area under curve value of 0.7648 for the receiver operating characteristic curve.
Semantic Web has brought forth the idea of computing with knowledge, hence, attributing the ability of thinking to machines. Knowledge Graphs represent a major advancement in the construction of the Web of Data where machines are context-aware when answering users' queries. The English Knowledge Graph was a milestone realized by Google in 2012. Even though it is a useful source of information for English users and applications, it does not offer much for the Arabic users and applications. In this paper, we investigated the different challenges and opportunities prone to the life-cycle of the construction of the Arabic Knowledge Graph (AKG) while following some best practices and techniques. Additionally, this work suggests some potential solutions to these challenges. The proprietary factor of data creates a major problem in the way of harvesting this latter. Moreover, when the Arabic data is openly available, it is generally in an unstructured form which requires further processing. The complexity of the Arabic language itself creates a further problem for any automatic or semi-automatic extraction processes. Therefore, the usage of NLP techniques is a feasible solution. Some preliminary results are presented later in this paper. The AKG has very promising outcomes for the Semantic Web in general and the Arabic community in particular. The goal of the Arabic Knowledge Graph is mainly the integration of the different isolated datasets available on the Web. Later, it can be used in both the academic (by providing a large dataset for many different research fields and enhance discovery) and commercial sectors (by improving search engines, providing metadata, interlinking businesses).
Botnets have long been used for malicious purposes with huge economic costs to the society. With the proliferation of cheap but non-secure Internet-of-Things (IoT) devices generating large amounts of data, the potential for damage from botnets has increased manifold. There are several approaches to detect bots or botnets, though many traditional techniques are becoming less effective as botnets with centralized command & control structure are being replaced by peer-to-peer (P2P) botnets which are harder to detect. Several algorithms have been proposed in literature that use graph analysis or machine learning techniques to detect the overlay structure of P2P networks in communication graphs. Many of these algorithms however, depend on the availability of a universal communication graph or a communication graph aggregated from several ISPs, which is not likely to be available in reality. In real world deployments, significant gaps in communication graphs are expected and any solution proposed should be able to work with partial information. In this paper, we analyze the effectiveness of some community detection algorithms in detecting P2P botnets, especially with partial information. We show that the approach can work with only about half of the nodes reporting their communication graphs, with only small increase in detection errors.
The Internet of Things(IoT) has become a popular technology, and various middleware has been proposed and developed for IoT systems. However, there have been few studies on the data management of IoT systems. In this paper, we consider graph database models for the data management of IoT systems because these models can specify relationships in a straightforward manner among entities such as devices, users, and information that constructs IoT systems. However, applying a graph database to the data management of IoT systems raises issues regarding distribution and security. For the former issue, we propose graph database operations integrated with REST APIs. For the latter, we extend a graph edge property by adding access protocol permissions and checking permissions using the APIs with authentication. We present the requirements for a use case scenario in addition to the features of a distributed graph database for IoT data management to solve the aforementioned issues, and implement a prototype of the graph database.
Graph reordering is a powerful technique to increase the locality of the representations of graphs, which can be helpful in several applications. We study how the technique can be used to improve compression of graphs and inverted indexes. We extend the recent theoretical model of Chierichetti et al. (KDD 2009) for graph compression, and show how it can be employed for compression-friendly reordering of social networks and web graphs and for assigning document identifiers in inverted indexes. We design and implement a novel theoretically sound reordering algorithm that is based on recursive graph bisection. Our experiments show a significant improvement of the compression rate of graph and indexes over existing heuristics. The new method is relatively simple and allows efficient parallel and distributed implementations, which is demonstrated on graphs with billions of vertices and hundreds of billions of edges.
Tree structures such as breadth-first search (BFS) trees and minimum spanning trees (MST) are among the most fundamental graph structures in distributed network algorithms. However, by definition, these structures are not robust against failures and even a single edge's removal can disrupt their functionality. A well-studied concept which attempts to circumvent this issue is Fault-Tolerant Tree Structures, where the tree gets augmented with additional edges from the network so that the functionality of the structure is maintained even when an edge fails. These structures, or other equivalent formulations, have been studied extensively from a centralized viewpoint. However, despite the fact that the main motivations come from distributed networks, their distributed construction has not been addressed before. In this paper, we present distributed algorithms for constructing fault tolerant BFS and MST structures. The time complexity of our algorithms are nearly optimal in the following strong sense: they almost match even the lower bounds of constructing (basic) BFS and MST trees.
Let G be a finite connected graph, and let ρ be the spectral radius of its universal cover. For example, if G is k-regular then ρ=2√k−1. We show that for every r, there is an r-covering (a.k.a. an r-lift) of G where all the new eigenvalues are bounded from above by ρ. It follows that a bipartite Ramanujan graph has a Ramanujan r-covering for every r. This generalizes the r=2 case due to Marcus, Spielman and Srivastava (2013). Every r-covering of G corresponds to a labeling of the edges of G by elements of the symmetric group Sr. We generalize this notion to labeling the edges by elements of various groups and present a broader scenario where Ramanujan coverings are guaranteed to exist. In particular, this shows the existence of richer families of bipartite Ramanujan graphs than was known before. Inspired by Marcus-Spielman-Srivastava, a crucial component of our proof is the existence of interlacing families of polynomials for complex reflection groups. The core argument of this component is taken from Marcus-Spielman-Srivastava (2015). Another important ingredient of our proof is a new generalization of the matching polynomial of a graph. We define the r-th matching polynomial of G to be the average matching polynomial of all r-coverings of G. We show this polynomial shares many properties with the original matching polynomial. For example, it is real rooted with all its roots inside [−ρ,ρ].
In this paper, we propose a new risk analysis framework that enables to supervise risks in complex and distributed systems. Our contribution is twofold. First, we provide the Risk Assessment Graphs (RAGs) as a model of risk analysis. This graph-based model is adaptable to the system changes over the time. We also introduce the potentiality and the accessibility functions which, during each time slot, evaluate respectively the chance of exploiting the RAG's nodes, and the connection time between these nodes. In addition, we provide a worst-case risk evaluation approach, based on the assumption that the intruder threats usually aim at maximising their benefits by inflicting the maximum damage to the target system (i.e. choosing the most likely paths in the RAG). We then introduce three security metrics: the propagated risk, the node risk and the global risk. We illustrate the use of our framework through the simple example of an enterprise email service. Our framework achieves both flexibility and generality requirements, it can be used to assess the external threats as well as the insider ones, and it applies to a wide set of applications.
In this work, we introduce a new Markov operator associated with a digraph, which we refer to as a nonlinear Laplacian. Unlike previous Laplacians for digraphs, the nonlinear Laplacian does not rely on the stationary distribution of the random walk process and is well defined on digraphs that are not strongly connected. We show that the nonlinear Laplacian has nontrivial eigenvalues and give a Cheeger-like inequality, which relates the conductance of a digraph and the smallest non-zero eigenvalue of its nonlinear Laplacian. Finally, we apply the nonlinear Laplacian to the analysis of real-world networks and obtain encouraging results.
This paper proposes a novel measure for edge significance considering quantity propagation in a graph. Our method utilizes a pseudo propagation process brought by solving a problem of a load balancing on nodes. Edge significance is defined as a difference of propagation in a graph with an edge to without it. The simulation compares our proposed method with the traditional betweenness centrality in order to obtain differences of our measure to a type of centrality, which considers propagation process in a graph.
Malware detection has been widely studied by analysing either file dropping relationships or characteristics of the file distribution network. This paper, for the first time, studies a global heterogeneous malware delivery graph fusing file dropping relationship and the topology of the file distribution network. The integration offers a unique ability of structuring the end-to-end distribution relationship. However, it brings large heterogeneous graphs to analysis. In our study, an average daily generated graph has more than 4 million edges and 2.7 million nodes that differ in type, such as IPs, URLs, and files. We propose a novel Bayesian label propagation model to unify the multi-source information, including content-agnostic features of different node types and topological information of the heterogeneous network. Our approach does not need to examine the source codes nor inspect the dynamic behaviours of a binary. Instead, it estimates the maliciousness of a given file through a semi-supervised label propagation procedure, which has a linear time complexity w.r.t. the number of nodes and edges. The evaluation on 567 million real-world download events validates that our proposed approach efficiently detects malware with a high accuracy.
In this paper, we propose a new risk analysis framework that enables to supervise risks in complex and distributed systems. Our contribution is twofold. First, we provide the Risk Assessment Graphs (RAGs) as a model of risk analysis. This graph-based model is adaptable to the system changes over the time. We also introduce the potentiality and the accessibility functions which, during each time slot, evaluate respectively the chance of exploiting the RAG's nodes, and the connection time between these nodes. In addition, we provide a worst-case risk evaluation approach, based on the assumption that the intruder threats usually aim at maximising their benefits by inflicting the maximum damage to the target system (i.e. choosing the most likely paths in the RAG). We then introduce three security metrics: the propagated risk, the node risk and the global risk. We illustrate the use of our framework through the simple example of an enterprise email service. Our framework achieves both flexibility and generality requirements, it can be used to assess the external threats as well as the insider ones, and it applies to a wide set of applications.
In this paper, we propose a new risk analysis framework that enables to supervise risks in complex and distributed systems. Our contribution is twofold. First, we provide the Risk Assessment Graphs (RAGs) as a model of risk analysis. This graph-based model is adaptable to the system changes over the time. We also introduce the potentiality and the accessibility functions which, during each time slot, evaluate respectively the chance of exploiting the RAG's nodes, and the connection time between these nodes. In addition, we provide a worst-case risk evaluation approach, based on the assumption that the intruder threats usually aim at maximising their benefits by inflicting the maximum damage to the target system (i.e. choosing the most likely paths in the RAG). We then introduce three security metrics: the propagated risk, the node risk and the global risk. We illustrate the use of our framework through the simple example of an enterprise email service. Our framework achieves both flexibility and generality requirements, it can be used to assess the external threats as well as the insider ones, and it applies to a wide set of applications.
We explicitly construct an extractor for two independent sources on n bits, each with polylogarithmic min-entropy. Our extractor outputs one bit and has polynomially small error. The best previous extractor, by Bourgain, required each source to have min-entropy .499n. A key ingredient in our construction is an explicit construction of a monotone, almost-balanced Boolean functions that are resilient to coalitions. In fact, our construction is stronger in that it gives an explicit extractor for a generalization of non-oblivious bit-fixing sources on n bits, where some unknown n-q bits are chosen almost polylogarithmic-wise independently, and the remaining q bits are chosen by an adversary as an arbitrary function of the n-q bits. The best previous construction, by Viola, achieved q quadratically smaller than our result. Our explicit two-source extractor directly implies improved constructions of a K-Ramsey graph over N vertices, improving bounds obtained by Barak et al. and matching independent work by Cohen.
Moving target defense (MTD) is an emerging paradigm in which system defenses dynamically mutate in order to decrease the overall system attack surface. Though the concept is promising, implementations have not been widely adopted. The field has been actively researched for over ten years, and has only produced a small amount of extensively adopted defenses, most notably, address space layout randomization (ASLR). This is despite the fact that there currently exist a variety of moving target implementations and proofs-of-concept. We suspect that this results from the moving target controls breaking critical system dependencies from the perspectives of users and administrators, as well as making things more difficult for attackers. As a result, the impact of the controls on overall system security is not sufficient to overcome the inconvenience imposed on legitimate system users. In this paper, we analyze a successful MTD approach. We study the control's dependency graphs, showing how we use graph theoretic and network properties to predict the effectiveness of the selected control.
Go is a programming language developed at Google, with channel-based concurrent features based on CSP. Go can detect global communication deadlocks at runtime when all threads of execution are blocked, but deadlocks in other paths of execution could be undetected. We present a new static analyser for concurrent Go code to find potential communication errors such as communication mismatch and deadlocks at compile time. Our tool extracts the communication operations as session types, which are then converted into Communicating Finite State Machines (CFSMs). Finally, we apply a recent theoretical result on choreography synthesis to generate a global graph representing the overall communication pattern of a concurrent program. If the synthesis is successful, then the program is free from communication errors. We have implemented the technique in a tool, and applied it to analyse common Go concurrency patterns and an open source application with over 700 lines of code.
The American National Standards Institute (ANSI) has standardized an access control approach, Next Generation Access Control (NGAC), that enables simultaneous instantiation of multiple access control policies. For large complex enterprises this is critical to limiting the authorized access of insiders. However, the specifications describe the required access control capabilities but not the related algorithms. While appropriate, this leave open the important question as to whether or not NGAC is scalable. Existing cubic reference implementations indicate that it does not. For example, the primary NGAC reference implementation took several minutes to simply display the set of files accessible to a user on a moderately sized system. To solve this problem we provide an efficient access control decision algorithm, reducing the overall complexity from cubic to linear. Our other major contribution is to provide a novel mechanism for administrators and users to review allowed access rights. We provide an interface that appears to be a simple file directory hierarchy but in reality is an automatically generated structure abstracted from the underlying access control graph that works with any set of simultaneously instantiated access control policies. Our work thus provides the first efficient implementation of NGAC while enabling user privilege review through a novel visualization approach. These capabilities help limit insider access to information (and thereby limit information leakage) by enabling the efficient simultaneous instantiation of multiple access control policies.
The Critical Node Detection Problem (CNDP) is a well-known NP-complete, graph-theoretical problem with many real-world applications in various fields such as social network analysis, supply-chain network analysis, transport engineering, network immunization, and military strategic planning. We present the first parallel algorithms for CNDP solving in general, and for fast, approximated CND on GPU and in the cloud in particular. Finally, we discuss results of our experimental performance analysis of these solutions.
Almost all commodity IT devices include firmware and software components from non-US suppliers, potentially introducing grave vulnerabilities to homeland security by enabling cyber-attacks via flaws injected into these devices through the supply chain. However, determining that a given device is free of any and all implementation flaws is computationally infeasible in the general case; hence a critical part of any vetting process is prioritizing what kinds of flaws are likely to enable potential adversary goals. We present Theseus, a four-year research project sponsored by the DARPA VET program. Theseus will provide technology to automatically map and explore the firmware/software (FW/SW) architecture of a commodity IT device and then generate attack scenarios for the device. From these device attack scenarios, Theseus then creates a prioritized checklist of FW/SW components to check for potential vulnerabilities. Theseus combines static program analysis, attack graph generation algorithms, and a Boolean satisfiability solver to automate the checklist generation workflow. We describe how Theseus exploits analogies between the commodity IT device problem and attack graph generation for networks. We also present a novel approach called Component Interaction Mapping to recover a formal model of a device's FW/SW architecture from which attack scenarios can be generated.
The globalization of trade is due to the transportation possibilities and the standardization (containerization of freight). The dependency of the economy to the sea and to the merchant navy has increase this last decade. This process forms a worldwide maritime network between the different locations of production and consumption. This network, representing between 80 % and 90% of world traffic is a major economic concern, including freight distribution, raw materials or energy. Rodrigue demonstrates[1] the economic dependency of energy is increasing in the industrialized countries (North America, Europe, East Asia). The inter-regional trade of oil was 31 million bbl/day in 2002 and is expected to grow up to 57 bbl/day in 2030 [2]. Most of the international traffic use a maritime way, where may occur disruptions. For example, the Suez crisis (1956-1957) caused a closure of the canal, reducing the throughput capacity of transportation. This disruption cost a 2 millions of barrels lost per day. This article focuses on vulnerability of the energy supply, and proposes a methodology to formalize and assess the vulnerability of the network by taking into account the spatial structure of maritime territories.
The TRECVID report of 2010 [14] evaluated video shot boundary detectors as achieving "excellent performance on [hard] cuts and gradual transitions." Unfortunately, while re-evaluating the state of the art of the shot boundary detection, we found that they need to be improved because the characteristics of consumer-produced videos have changed significantly since the introduction of mobile gadgets, such as smartphones, tablets and outdoor activity purposed cameras, and video editing software has been evolving rapidly. In this paper, we evaluate the best-known approach on a contemporary, publicly accessible corpus, and present a method that achieves better performance, particularly on soft transitions. Our method combines color histograms with key point feature matching to extract comprehensive frame information. Two similarity metrics, one for individual frames and one for sets of frames, are defined based on graph cuts. These metrics are formed into temporal feature vectors on which a SVM is trained to perform the final segmentation. The evaluation on said "modern" corpus of relatively short videos yields a performance of 92% recall (at 89% precision) overall, compared to 69% (91%) of the best-known method.