Biblio
Control of a large engineered swarm can be achieved by influencing key agents within the swarm. The swarm can rely on its communication network to spread the external perturbation and transition to a new state when all agents reach a consensus. Maximising this consensus speed is a vital design parameter when fast response is desirable. The systems analysed consist of N interacting agents that have the same number of outward, observing, connections that follow k-nearest neighbour rules and are represented by a directed graph Laplacian. The spectral properties of this graph are exploited to identify leaders with a newly presented semi-analytical approach referred to as the Leaders of Influence (LoI) method. This method is demonstrated on k-NNR graphs for a set number of leaders. These methods are compared with a genetic algorithm and are shown to be efficient and effective at leader identification. A focus of this work is the effect of leadership style on consensus speed where an autocratic approach (leaders that are not influenced by other nodes in the graph) is shown to always produce faster consensus than a democratic leadership model.
Many applications of mobile computing require the computation of dot-product of two vectors. For examples, the dot-product of an individual's genome data and the gene biomarkers of a health center can help detect diseases in m-Health, and that of the interests of two persons can facilitate friend discovery in mobile social networks. Nevertheless, exposing the inputs of dot-product computation discloses sensitive information about the two participants, leading to severe privacy violations. In this paper, we tackle the problem of privacy-preserving dot-product computation targeting mobile computing applications in which secure channels are hardly established, and the computational efficiency is highly desirable. We first propose two basic schemes and then present the corresponding advanced versions to improve efficiency and enhance privacy-protection strength. Furthermore, we theoretically prove that our proposed schemes can simultaneously achieve privacy-preservation, non-repudiation, and accountability. Our numerical results verify the performance of the proposed schemes in terms of communication and computational overheads.
We present an automatic method to build a layered vector graphics structure ready for animation from a clean-line vector drawing of an organic, smooth shape. Inspiring from 3D segmentation methods, we introduce a new metric computed on the medial axis of a region to identify and quantify the visual salience of a sub-region relative to the rest. This enables us to recursively separate each region into two closed sub-regions at the location of the most salient junction. The resulting structure, layered in depth, can be used to pose and animate the drawing using a regular 2D skeleton.
Recently, the chaotic public-key cryptography attracts much attention of researchers, due to the great characters of chaotic maps. With the security superiorities and computation efficiencies of chaotic map over other cryptosystems, in this paper, a novel Identity-based signcryption scheme is proposed using extended chaotic maps. The difficulty of chaos-based discrete logarithm (CDL) problem lies the foundation of the security of proposed ECM-IBSC scheme.
Particle Swarm Optimization (PSO) has been shown to perform very well on a wide range of optimization problems. One of the drawbacks to PSO is that the base algorithm assumes continuous variables. In this paper, we present a version of PSO that is able to optimize over discrete variables. This new PSO algorithm, which we call Integer and Categorical PSO (ICPSO), incorporates ideas from Estimation of Distribution Algorithms (EDAs) in that particles represent probability distributions rather than solution values, and the PSO update modifies the probability distributions. In this paper, we describe our new algorithm and compare its performance against other discrete PSO algorithms. In our experiments, we demonstrate that our algorithm outperforms comparable methods on both discrete benchmark functions and NK landscapes, a mathematical framework that generates tunable fitness landscapes for evaluating EAs.
Dealing with increasing amounts of data creates the need to deal with redundant, inconsistent and/or complementary repositories which may be different in their data models and/or in their schema. Current data cleaning techniques developed to tackle data quality problems are just suitable for scenarios were all repositories share the same model and schema. Recently, an ontology-based methodology was proposed to overcome this limitation. In this paper, this methodology is briefly described and applied to a real scenario in the health domain with data quality problems.
One way of evaluating the reusability of a test collection is to determine whether removing the unique contributions of some system would alter the preference order between that system and others. Rank correlation measures such as Kendall's tau are often used for this purpose. Rank correlation measures are appropriate for ordinal measures in which only preference order is important, but many evaluation measures produce system scores in which both the preference order and the magnitude of the score difference are important. Such measures are referred to as interval. Pearson's rho offers one way in which correlation can be computed over results from an interval measure such that smaller errors in the gap size are preferred. When seeking to improve over existing systems, we care the most about comparisons among the best systems. For that purpose we prefer head-weighed measures such as tau\_AP, which is designed for ordinal data. No present head weighted measure fully leverages the information present in interval effectiveness measures. This paper introduces such a measure, referred to as Pearson Rank.
Contactless communications have become omnipresent in our daily lives, from simple access cards to electronic passports. Such systems are particularly vulnerable to relay attacks, in which an adversary relays the messages from a prover to a verifier. Distance-bounding protocols were introduced to counter such attacks. Lately, there has been a very active research trend on improving the security of these protocols, but also on ensuring strong privacy properties with respect to active adversaries and malicious verifiers. In particular, a difficult threat to address is the terrorist fraud, in which a far-away prover cooperates with a nearby accomplice to fool a verifier. The usual defence against this attack is to make it impossible for the accomplice to succeed unless the prover provides him with enough information to recover his secret key and impersonate him later on. However, the mere existence of a long-term secret key is problematic with respect to privacy. In this paper, we propose a novel approach in which the prover does not leak his secret key but a reusable session key along with a group signature on it. This allows the adversary to impersonate him even without knowing his signature key. Based on this approach, we give the first distance-bounding protocol, called SPADE, integrating anonymity, revocability and provable resistance to standard threat models.
The ever increasing interest in semantic technologies and the availability of several open knowledge sources have fueled recent progress in the field of recommender systems. In this paper we feed recommender systems with features coming from the Linked Open Data (LOD) cloud - a huge amount of machine-readable knowledge encoded as RDF statements - with the aim of improving recommender systems effectiveness. In order to exploit the natural graph-based structure of RDF data, we study the impact of the knowledge coming from the LOD cloud on the overall performance of a graph-based recommendation algorithm. In more detail, we investigate whether the integration of LOD-based features improves the effectiveness of the algorithm and to what extent the choice of different feature selection techniques influences its performance in terms of accuracy and diversity. The experimental evaluation on two state of the art datasets shows a clear correlation between the feature selection technique and the ability of the algorithm to maximize a specific evaluation metric. Moreover, the graph-based algorithm leveraging LOD-based features is able to overcome several state of the art baselines, such as collaborative filtering and matrix factorization, thus confirming the effectiveness of the proposed approach.
The validation of simulation models (e.g., of electronic control units for vehicles) in industry is becoming increasingly challenging due to their growing complexity. To systematically assess the quality of such models, software metrics seem to be promising. In this paper we explore the use of software metrics and outlier analysis as a means to assess the quality of model-based software. More specifically, we investigate how results from regression analysis applied to measurement data received from size and complexity metrics can be mapped to software quality. Using the moving averages approach, models were fit to data received from over 65,000 software revisions for 71 simulation models that represent different electronic control units of real premium vehicles. Consecutive investigations using studentized deleted residuals and Cook’s Distance revealed outliers among the measurements. From these outliers we identified a subset, which provides meaningful information (anomalies) by comparing outlier scores with expert opinions. Eight engineers were interviewed separately for outlier impact on software quality. Findings were validated in consecutive workshops. The results show correlations between outliers and their impact on four of the considered quality characteristics. They also demonstrate the applicability of this approach in industry.
As the Internet becomes an important part of the infrastructure our society depends on, it is crucial to construct networks that are able to work even when part of the network is compromised. This paper presents the first practical intrusion-tolerant network service, targeting high-value applications such as monitoring and control of global clouds and management of critical infrastructure for the power grid. We use an overlay approach to leverage the existing IP infrastructure while providing the required resiliency and timeliness. Our solution overcomes malicious attacks and compromises in both the underlying network infrastructure and in the overlay itself. We deploy and evaluate the intrusion-tolerant overlay implementation on a global cloud spanning East Asia, North America, and Europe, and make it publicly available.
Adaptivity is an important feature of data analysis - the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) and Hardt and Ullman (FOCS, 2014) initiated a general formal study of this problem, and gave the first upper and lower bounds on the achievable generalization error for adaptive data analysis. Specifically, suppose there is an unknown distribution P and a set of n independent samples x is drawn from P. We seek an algorithm that, given x as input, accurately answers a sequence of adaptively chosen ``queries'' about the unknown distribution P. How many samples n must we draw from the distribution, as a function of the type of queries, the number of queries, and the desired level of accuracy? In this work we make two new contributions towards resolving this question: We give upper bounds on the number of samples n that are needed to answer statistical queries. The bounds improve and simplify the work of Dwork et al. (STOC, 2015), and have been applied in subsequent work by those authors (Science, 2015; NIPS, 2015). We prove the first upper bounds on the number of samples required to answer more general families of queries. These include arbitrary low-sensitivity queries and an important class of optimization queries (alternatively, risk minimization queries). As in Dwork et al., our algorithms are based on a connection with algorithmic stability in the form of differential privacy. We extend their work by giving a quantitatively optimal, more general, and simpler proof of their main theorem that the stability notion guaranteed by differential privacy implies low generalization error. We also show that weaker stability guarantees such as bounded KL divergence and total variation distance lead to correspondingly weaker generalization guarantees.
Parallel and distributed systems rely on intricate protocols to manage shared resources and synchronize, i.e., to manage how many processes are in a particular state. Effective verification of such systems requires universally quantification to reason about parameterized state and cardinalities tracking sets of processes, messages, failures to adequately capture protocol logic. In this paper we present Tool, an automatic invariant synthesis method that integrates cardinality-based reasoning and universal quantification. The resulting increase of expressiveness allows Tool to verify, for the first time, a representative collection of intricate parameterized protocols.
It is common practice for data scientists to acquire and integrate disparate data sources to achieve higher quality results. But even with a perfectly cleaned and merged data set, two fundamental questions remain: (1) is the integrated data set complete and (2) what is the impact of any unknown (i.e., unobserved) data on query results? In this work, we develop and analyze techniques to estimate the impact of the unknown data (a.k.a., unknown unknowns) on simple aggregate queries. The key idea is that the overlap between different data sources enables us to estimate the number and values of the missing data items. Our main techniques are parameter-free and do not assume prior knowledge about the distribution. Through a series of experiments, we show that estimating the impact of unknown unknowns is invaluable to better assess the results of aggregate queries over integrated data sources.
Systematic implementation of System-on-Chip (SoC) security policies typically involves smart wrappers extracting local security critical events of interest from Intellectual Property (IP) blocks, together with a control engine that communicates with the wrappers to analyze the events for policy adherence. However, developing customized wrappers at each IP for security requirements may incur significant overhead in area and hardware resources. In this paper, we address this problem by exploiting the extensive design-for-debug (DfD) instrumentation already available on-chip. In addition to reduction in the overall hardware overhead, the approach also adds flexibility to the security architecture itself, e.g., permitting use of on-field DfD instrumentation, survivability and control hooks to patch security policy implementation in response to bugs and attacks found at post-silicon or changing security requirements on-field. We show how to design scalable interface between security and debug architectures that provides the benefits of flexibility to security policy implementation without interfering with existing debug and survivability use cases and at minimal additional cost in energy and design complexity.
SDSM is a novel approach for providing secure directory-based distributed shared memory to CPUs that are connected via an untrusted medium. SDSM scales efficiently to thousands of CPUs with less than 0.8% performance reduction, compared to a system with no security.
It is expected that DRAM memory will be augmented, and perhaps eventually replaced, by one of several up-and-coming memory technologies. These are all non-volatile, in that they retain their contents without power. This allows primary memory to be used as a fast disk replacement. It also enables more aggressive programming models that directly leverage persistence of primary memory. However, it is challenging to maintain consistency of memory in such an environment. There is no consensus on the right programming model for doing so, and subtle differences can have large, and sometimes surprising, effects on the implementation and its performance. The existing literature describes multiple programming systems that provide point solutions to the selective persistence for user data structures. Real progress in this area requires a choice of programming model, which we cannot reasonably make without a real understanding of the design space. Point solutions are insufficient. We systematically explore what we consider to be the most promising part of the space, precisely defining semantics and identifying implementation costs. This allows us to be much more explicit and precise about semantic and implementation trade-offs that were usually glossed over in prior work. It also exposes some promising new design alternatives.
We study a sensor network setting in which samples are encrypted individually using different keys and maintained on a cloud storage. For large systems, e.g. those that generate several millions of samples per day, fine-grained sharing of encrypted samples is challenging. Existing solutions, such as Attribute-Based Encryption (ABE) and Key Aggregation Cryptosystem (KAC), can be utilized to address the challenge, but only to a certain extent. They are often computationally expensive and thus unlikely to operate at scale. We propose an algorithmic enhancement and two heuristics to improve KAC's key reconstruction cost, while preserving its provable security. The improvement is particularly significant for range and down-sampling queries – accelerating the reconstruction cost from quadratic to linear running time. Experimental study shows that for queries of size 32k samples, the proposed fast reconstruction techniques speed-up the original KAC by at least 90 times on range and down-sampling queries, and by eight times on general (arbitrary) queries. It also shows that at the expense of splitting the query into 16 sub-queries and correspondingly issuing that number of different aggregated keys, reconstruction time can be reduced by 19 times. As such, the proposed techniques make KAC more applicable in practical scenarios such as sensor networks or the Internet of Things.
We present the first complexity-theoretic secure steganographic protocol which, for any communication channel, is provably secure, reliable, and has nearly optimal bandwidth. Our system is unconditionally secure, i.e. our proof does not rely on any unproven complexity-theoretic assumption, like e.g. the existence of one-way functions. This disproves the claim that the existence of one-way functions and access to a communication channel oracle are both necessary and sufficient conditions for the existence of secure steganography, in the sense that secure and reliable steganography exists independently of the existence of one-way functions.
Semantic Big Data is about the creation of new applications exploiting the richness and flexibility of declarative semantics combined with scalable and highly distributed data management systems. In this work, we present an application scenario in which a domain ontology, Open Refine and the Okkam Entity Name System enable a frictionless and scalable data integration process leading to a knowledge base for tax assessment. Further, we introduce the concept of Entiton as a flexible and efficient data model suitable for large scale data inference and analytic tasks. We successfully tested our data processing pipeline on a real world dataset, supporting ACI Informatica in the investigation for Vehicle Excise Duty (VED) evasion in Aosta Valley region (Italy). Besides useful business intelligence indicators, we implemented a distributed temporal inference engine to unveil VED evasion and circulation ban violations. The results of the integration are presented to the tax agents in a powerful Siren Solution KiBi dashboard, enabling seamless data exploration and business intelligence.
There is a tendency to move production environments from corporate-owned data centers to cloud-based services. Users who do not maintain a private production environment might not wish to maintain a private performance test environment either. The application of performance engineering methods to the development and delivery of software systems is complicated when the form and or parameters of the target deployment environment cannot be controlled or determined. The difficulty of diagnosing the causes of performance issues during testing or production may be increased by the presence of highly variable workloads on the target platform that compete with the application of interest for resources in ways that might be hard to determine. In particular, performance tests might be conducted in virtualized environments that introduce factors influencing customer-affecting metrics (such as transaction response time) and observed resource usage. Observed resource usage metrics in virtualized environments can have different meanings from those in a native environment. Virtual machines may suffer delays in execution. We explore factors that exacerbate these complications. We argue that these complexities reinforce the case for rigorously using software performance engineering methods rather than diminishing it. We also explore possible performance testing methods for mitigating the risk associated with these complexities.
The necessity to deploy wireless mesh network is determined by the real world application requirements. WMN does not fit some application well due to latency issues and capacity related problem with paths having more than 2 hops. With the promising IEEE 802.11ac based device a better fairness for multi-hop communications are expected to support broadband application; the rate usually varies according to the link quality and network environment. Careful network planning can effectively improves the throughput and delay of the overall network. We provide model for the placement of router nodes as an optimization process to improve performance. Our aim is to propose a WMNs planning model based on multiobjective constraints like coverage, reliability, and cost of deployment. The bit rate guarantee therefore necessary to limit the number of stations connected to the access point; to takes into account delay and fairness of the network the user's behaviors are derived. We use a multiobjective evolutionary algorithm based metaheuristic to evaluate the performance of our proposed placement algorithm.
SDN’s logically centralized control provides an insertion point for programming the network. While it is generally agreed that higherlevel abstractions are needed to make that programming easy, there is little consensus on what are the “right” abstractions. Indeed, as SDN moves beyond its initial specialized deployments to broader use cases, it is likely that network control applications will require diverse abstractions that evolve over time. To this end, we champion a perspective that SDN control fundamentally revolves around data representation. We discard any application-specific structure that might be outgrown by new demands. Instead, we adopt a plain data representation of the entire network — network topology, forwarding, and control applications — and seek a universal data language that allows application programmers to transform the primitive representation into any high-level representations presented to applications or network operators. Driven by this insight, we present a system, Ravel, that implements an entire SDN network control infrastructure within a standard SQL database. In Ravel, network abstractions take the form of user-defined SQL views expressed by SQL queries that can be added on the fly. A key challenge in realizing this approach is to orchestrate multiple simultaneous abstractions that collectively affect the same underlying data. To achieve this, Ravel enhances the database with novel data integration mechanisms that merge the multiple views into a coherent forwarding behavior. Moreover, Ravel is exposed to applications through the one simple, familiar and highly interoperable SQL interface. While this is an ambitious long-term goal, our prototype built on the PostgreSQL database exhibits promising performance even for large scale networks.