Biblio
The ever increasing interest in semantic technologies and the availability of several open knowledge sources have fueled recent progress in the field of recommender systems. In this paper we feed recommender systems with features coming from the Linked Open Data (LOD) cloud - a huge amount of machine-readable knowledge encoded as RDF statements - with the aim of improving recommender systems effectiveness. In order to exploit the natural graph-based structure of RDF data, we study the impact of the knowledge coming from the LOD cloud on the overall performance of a graph-based recommendation algorithm. In more detail, we investigate whether the integration of LOD-based features improves the effectiveness of the algorithm and to what extent the choice of different feature selection techniques influences its performance in terms of accuracy and diversity. The experimental evaluation on two state of the art datasets shows a clear correlation between the feature selection technique and the ability of the algorithm to maximize a specific evaluation metric. Moreover, the graph-based algorithm leveraging LOD-based features is able to overcome several state of the art baselines, such as collaborative filtering and matrix factorization, thus confirming the effectiveness of the proposed approach.
Social animals as found in fish schools, bird flocks, bee hives, and ant colonies are able to solve highly complex problems in nature. This includes foraging for food, constructing astonishingly complex nests, and evading or defending against predators. Remarkably, these animals in many cases use very simple, decentralized communication mechanisms that do not require a single leader. This makes the animals perform surprisingly well, even in dynamically changing environments. The collective intelligence of such animals is known as swarm intelligence and it has inspired popular and very powerful optimization paradigms, including ant colony optimization (ACO) and particle swarm optimization (PSO). The reasons behind their success are often elusive. We are just beginning to understand when and why swarm intelligence algorithms perform well, and how to use swarm intelligence most effectively. Understanding the fundamental working principles that determine their efficiency is a major challenge. This tutorial will give a comprehensive overview of recent theoretical results on swarm intelligence algorithms, with an emphasis on their efficiency (runtime/computational complexity). In particular, the tutorial will show how techniques for the analysis of evolutionary algorithms can be used to analyze swarm intelligence algorithms and how the performance of swarm intelligence algorithms compares to that of evolutionary algorithms. The results shed light on the working principles of swarm intelligence algorithms, identify the impact of parameters and other design choices on performance, and thus help to use swarm intelligence more effectively. The tutorial will be divided into a first, larger part on ACO and a second, smaller part on PSO. For ACO we will consider simple variants of the MAX-MIN ant system. Investigations of example functions in pseudo-Boolean optimization demonstrate that the choices of the pheromone update strategy and the evaporation rate have a drastic impact on the running time. We further consider the performance of ACO on illustrative problems from combinatorial optimization: constructing minimum spanning trees, solving shortest path problems with and without noise, and finding short tours for the TSP. For particle swarm optimization, the tutorial will cover results on PSO for pseudo-Boolean optimization as well as a discussion of theoretical results in continuous spaces.
The validation of simulation models (e.g., of electronic control units for vehicles) in industry is becoming increasingly challenging due to their growing complexity. To systematically assess the quality of such models, software metrics seem to be promising. In this paper we explore the use of software metrics and outlier analysis as a means to assess the quality of model-based software. More specifically, we investigate how results from regression analysis applied to measurement data received from size and complexity metrics can be mapped to software quality. Using the moving averages approach, models were fit to data received from over 65,000 software revisions for 71 simulation models that represent different electronic control units of real premium vehicles. Consecutive investigations using studentized deleted residuals and Cook’s Distance revealed outliers among the measurements. From these outliers we identified a subset, which provides meaningful information (anomalies) by comparing outlier scores with expert opinions. Eight engineers were interviewed separately for outlier impact on software quality. Findings were validated in consecutive workshops. The results show correlations between outliers and their impact on four of the considered quality characteristics. They also demonstrate the applicability of this approach in industry.
As the Internet becomes an important part of the infrastructure our society depends on, it is crucial to construct networks that are able to work even when part of the network is compromised. This paper presents the first practical intrusion-tolerant network service, targeting high-value applications such as monitoring and control of global clouds and management of critical infrastructure for the power grid. We use an overlay approach to leverage the existing IP infrastructure while providing the required resiliency and timeliness. Our solution overcomes malicious attacks and compromises in both the underlying network infrastructure and in the overlay itself. We deploy and evaluate the intrusion-tolerant overlay implementation on a global cloud spanning East Asia, North America, and Europe, and make it publicly available.
Adaptivity is an important feature of data analysis - the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) and Hardt and Ullman (FOCS, 2014) initiated a general formal study of this problem, and gave the first upper and lower bounds on the achievable generalization error for adaptive data analysis. Specifically, suppose there is an unknown distribution P and a set of n independent samples x is drawn from P. We seek an algorithm that, given x as input, accurately answers a sequence of adaptively chosen ``queries'' about the unknown distribution P. How many samples n must we draw from the distribution, as a function of the type of queries, the number of queries, and the desired level of accuracy? In this work we make two new contributions towards resolving this question: We give upper bounds on the number of samples n that are needed to answer statistical queries. The bounds improve and simplify the work of Dwork et al. (STOC, 2015), and have been applied in subsequent work by those authors (Science, 2015; NIPS, 2015). We prove the first upper bounds on the number of samples required to answer more general families of queries. These include arbitrary low-sensitivity queries and an important class of optimization queries (alternatively, risk minimization queries). As in Dwork et al., our algorithms are based on a connection with algorithmic stability in the form of differential privacy. We extend their work by giving a quantitatively optimal, more general, and simpler proof of their main theorem that the stability notion guaranteed by differential privacy implies low generalization error. We also show that weaker stability guarantees such as bounded KL divergence and total variation distance lead to correspondingly weaker generalization guarantees.
Repairing erroneous or conflicting data that violate a set of constraints is an important problem in data management. Many automatic or semi-automatic data-repairing algorithms have been proposed in the last few years, each with its own strengths and weaknesses. Bart is an open-source error-generation system conceived to support thorough experimental evaluations of these data-repairing systems. The demo is centered around three main lessons. To start, we discuss how generating errors in data is a complex problem, with several facets. We introduce the important notions of detectability and repairability of an error, that stand at the core of Bart. Then, we show how, by changing the features of errors, it is possible to influence quite significantly the performance of the tools. Finally, we concretely put to work five data-repairing algorithms on dirty data of various kinds generated using Bart, and discuss their performance.
Hardware support for isolated execution (such as Intel SGX) enables development of applications that keep their code and data confidential even while running in a hostile or compromised host. However, automatically verifying that such applications satisfy confidentiality remains challenging. We present a methodology for designing such applications in a way that enables certifying their confidentiality. Our methodology consists of forcing the application to communicate with the external world through a narrow interface, compiling it with runtime checks that aid verification, and linking it with a small runtime that implements the narrow interface. The runtime includes services such as secure communication channels and memory management. We formalize this restriction on the application as Information Release Confinement (IRC), and we show that it allows us to decompose the task of proving confidentiality into (a) one-time, human-assisted functional verification of the runtime to ensure that it does not leak secrets, (b) automatic verification of the application's machine code to ensure that it satisfies IRC and does not directly read or corrupt the runtime's internal state. We present /CONFIDENTIAL: a verifier for IRC that is modular, automatic, and keeps our compiler out of the trusted computing base. Our evaluation suggests that the methodology scales to real-world applications.
Information Systems curricula require on-going and frequent review [2] [11]. Furthermore, such curricula must be flexible because of the fast-paced, dynamic nature of the workplace. Such flexibility can be maintained through modernizing course content or, inclusively, exchanging hardware or software for newer versions. Alternatively, flexibility can arise from incorporating new information into curricula from other disciplines. One field where the pace of change is extremely high is cybersecurity [3]. Students are left with outdated skills when curricula lag behind the pace of change in industry. For example, cryptography is a required learning objective in the DHS/NSA Center of Academic Excellence (CAE) knowledge criteria [1]. However, the overarching curriculum associated with basic ciphers has gone unchanged for decades. Indeed, a general problem in cybersecurity education is that students lack fundamental knowledge in areas such as ciphers [5]. In response, researchers have developed a variety of interactive classroom visualization tools [5] [8] [9]. Such tools visualize the standard approach to frequency analysis of simple substitution ciphers that includes review of most common, single letters in ciphertext. While fundamental ciphers such as the monoalphabetic substitution cipher have not been updated (these are historical ciphers), collective understanding of how humans interact with language has changed. Updated understanding in both English language pedagogy [10] [12] and automated cryptanalysis of substitution ciphers [4] potentially renders the interactive classroom visualization tools incomplete or outdated. Classroom visualization tools are powerful teaching aids, particularly for abstract concepts. Existing research has established that such tools promote an active learning environment that translates to not only effective learning conditions but also higher student retention rates [7]. However, visualization tools require extensive planning and design when used to actively engage students with detailed, specific knowledge units such as ciphers [7] [8]. Accordingly, we propose a heatmap-based frequency analysis visualization solution that (a) incorporates digraph and trigraph language processing norms; (b) and enhances the active learning pedagogy inherent in visualization tools. Preliminary results indicate that study participants take approximately 15% longer to learn the heatmap-based frequency analysis technique compared to traditional frequency analysis but demonstrate a 50% increase in efficacy when tasked with solving simple substitution ciphers. Further, a heatmap-based solution contributes positively to the field insofar as educators have an additional tool to use in the classroom. As well, the heatmap visualization tool may allow researchers to comparatively examine efficacy of visualization tools in the cryptanalysis of mono-alphabetic substitution ciphers.
SDSM is a novel approach for providing secure directory-based distributed shared memory to CPUs that are connected via an untrusted medium. SDSM scales efficiently to thousands of CPUs with less than 0.8% performance reduction, compared to a system with no security.
Garbage is an endemic problem in developing cities due to the continual influx of migrants from rural areas coupled with deficient municipal capacity planning. In cities like Dhaka, open waste dumps contribute to the prevalence of disease, environmental contamination, catastrophic flooding, and deadly fires. Recent interest in the garbage problem has prompted cursory proposals to introduce technology solutions for mapping and fundraising. Yet, the role of technology and its potential benefits are unexplored in this large-scale problem. In this paper, we contribute to the understanding of the waste ecology in Dhaka and how the various actors acquire, perform, negotiate, and coordinate their roles. Within this context, we explore design opportunities for using computing technologies to support collaboration between waste pickers and residents of these communities. We find opportunities in the presence of technology and the absence of mechanisms to facilitate coordination of community funding and crowd work.
Various mobile applications require different QoS requirements, thus there is a need to resolve the application requirement into the underlying mesh network to support them. Existing approach to coordinate the application traffic requirement to underlying network has been applied in wired domains. However, it is complex in the wireless domain due to the mobility and diversity of mobile applications. Much interest is focused on resolving application QoS and match request to mesh network link availability. We propose a testbed architecture which allows dynamic configuration of mesh networks and coordination of each flow to support application-aware QoS. Our prototype testbed shows adaptive change in mesh network routing configuration depending on application requests.
We present Falcon, an interactive, deterministic, and declarative data cleaning system, which uses SQL update queries as the language to repair data. Falcon does not rely on the existence of a set of pre-defined data quality rules. On the contrary, it encourages users to explore the data, identify possible problems, and make updates to fix them. Bootstrapped by one user update, Falcon guesses a set of possible sql update queries that can be used to repair the data. The main technical challenge addressed in this paper consists in finding a set of sql update queries that is minimal in size and at the same time fixes the largest number of errors in the data. We formalize this problem as a search in a lattice-shaped space. To guarantee that the chosen updates are semantically correct, Falcon navigates the lattice by interacting with users to gradually validate the set of sql update queries. Besides using traditional one-hop based traverse algorithms (e.g., BFS or DFS), we describe novel multi-hop search algorithms such that Falcon can dive over the lattice and conduct the search efficiently. Our novel search strategy is coupled with a number of optimization techniques to further prune the search space and efficiently maintain the lattice. We have conducted extensive experiments using both real-world and synthetic datasets to show that Falcon can effectively communicate with users in data repairing.
The cloud has become an established and widespread paradigm. This success is due to the gain of flexibility and savings provided by this technology. However, the main obstacle to full cloud adoption is security. The cloud, as many other systems taking advantage of the Internet, is also facing threats that compromise data confidentiality and availability. In addition, new cloud-specific attacks have emerged and current intrusion detection and prevention mechanisms are not enough to protect the complex infrastructure of the cloud from these vulnerabilities. Furthermore, one of the promises of the cloud is the Quality of Service (QoS) by continuous delivery, which must be ensured even in case of intrusion. This work presents an overview of the main cloud vulnerabilities, along with the solutions proposed in the context of the H2020 CLARUS project in terms of monitoring techniques for intrusion detection and prevention, including attack-tolerance mechanisms.
Convolution serves as the basic computational primitive for various associative computing tasks ranging from edge detection to image matching. CMOS implementation of such computations entails significant bottlenecks in area and energy consumption due to the large number of multiplication and addition operations involved. In this paper, we propose an ultra-low power and compact hybrid spintronic-CMOS design for the convolution computing unit. Low-voltage operation of domain-wall motion based magneto-metallic "Spin-Memristor"s interfaced with CMOS circuits is able to perform the convolution operation with reasonable accuracy. Simulation results of Gabor filtering for edge detection reveal \textasciitilde 2.5× lower energy consumption compared to a baseline 45nm-CMOS implementation.
Ubiquitous WiFi infrastructure and smart phones offer a great opportunity to study physical activities. In this paper, we present MobiCamp, a large-scale testbed for studying mobility-related activities of residents on a campus. MobiCamp consists of \textasciitilde2,700 APs, \textasciitilde95,000 smart phones, and an App with \textasciitilde2,300 opt-in volunteer users. More specifically, we capture how mobile users interact with different types of buildings, with other users, and with classroom courses, etc. To achieve this goal, we first obtain a relatively complete coverage of the users' mobility traces by utilizing four types of information from SNMP and by relaxing the location granularity to roughly at the room level. Then the popular App provides user attributes (grade, gender, etc.) and fine-grained behavior information (phone usages, course timetables, etc.) of the sampled population. These detailed mobile data is then correlated with the mobility traces from the SNMP to estimate the entire campus population's physical activities. We use two applications to show the power of MobiCamp.
Multimedia security and copyright protection has been a popular topic for research and application, due to the explosion of data exchange over the internet and the widespread use of digital media. Watermarking is a process of hiding the digital information inside a digital media. Information hiding as digital watermarks in multimedia enables protection mechanism in decrypted contents. This paper presents a comparative study of existing technique used for digital watermarking an image using Genetic Algorithm and Bacterial Foraging Algorithm (BFO) based optimization technique with proposed one which consists of Genetic Algorithm and Honey Bee based optimization technique. The results obtained after experiment conclude that, new method has indeed outperformed then the conventional technique. The implementation is done over the MATLAB.
Security is playing a very important and crucial role in the field of network communication system and Internet. Kerberos Authentication Protocol is designed and developed by Massachusetts Institute of Technology (MIT) and it provides authentication by encrypting information and allow clients to access servers in a secure manner. This paper describes the design and implementation of Kerberos using Data Encryption Standard (DES). Data encryption standard (DES) is a private key cryptography system that provides the security in the communication system. Java Development Tool Kit as the front end and ms access as the back end are used for implementation.
Heterogeneous Chip Multiprocessors have been shown to provide significant performance and energy efficiency gains over homogeneous designs. Recent research has expanded the dimensions of heterogeneity to include diverse Instruction Set Architectures, called Heterogeneous-ISA Chip Multiprocessors. This work leverages such an architecture to realize substantial new security benefits, and in particular, to thwart Return-Oriented Programming. This paper proposes a novel security defense called HIPStR – Heterogeneous-ISA Program State Relocation – that performs dynamic randomization of run-time program state, both within and across ISAs. This technique outperforms the state-of-the-art just-in-time code reuse (JIT-ROP) defense by an average of 15.6%, while simultaneously providing greater security guarantees against classic return-into-libc, ROP, JOP, brute force, JIT-ROP, and several evasive variants.
Over transactional database systems MultiVersion concurrency control is maintained for secure, fast and efficient access to the shared data file implementation scenario. An effective coordination is supposed to be set up between owners and users also the developers & system operators, to maintain inter-cloud & intra-cloud communication Most of the services & application offered in cloud world are real-time, which entails optimized compatibility service environment between master and slave clusters. In the paper, offered methodology supports replication and triggering methods intended for data consistency and dynamicity. Where intercommunication between different clusters is processed through middleware besides slave intra-communication is handled by verification & identification protection. The proposed approach incorporates resistive flow to handle high impact systems that identifies and verifies multiple processes. Results show that the new scheme reduces the overheads from different master and slave servers as they are co-located in clusters which allow increased horizontal and vertical scalability of resources.