Biblio
Computer systems are developed taking into account that they should be easily maintained in the future. It is one of the main requirements for the sound architectural design. The existing approaches to introducing fault tolerance rely on recursive system structuring out of functional components – this typically results in non-optimal fault tolerance. The paper proposes a vision of structuring complex many-core systems by introducing a special component supporting system-wide fault tolerance coordination. The component acts as a central module making decisions about fault tolerance strategies to be implemented by individual system components depending on the performance and energy requirements specified as system operating modes.
One essential requirement for supporting analytics for Big Medical Data systems is the provision of a suitable level of traceability to data or processes ('Items') in large volumes of data. Systems should be designed from the outset to support usage of such Items across the spectrum of medical use and over time in order to promote traceability, to simplify maintenance and to assist analytics. The philosophy proposed in this paper is to design medical data systems using a 'description-driven' approach in which meta-data and the description of medical items are saved alongside the data, simplifying item re-use over time and thereby enabling the traceability of these items over time and their use in analytics. Details are given of a big data system in neuroimaging to demonstrate aspects of provenance data capture, collaborative analysis and longitudinal information traceability. Evidence is presented that the description-driven approach leads to simplicity of design and ease of maintenance following the adoption of a unified approach to Item management.
Provenance for transactional updates is critical for many applications such as auditing and debugging of transactions. Recently, we have introduced MV-semirings, an extension of the semiring provenance model that supports updates and transactions. Furthermore, we have proposed reenactment, a declarative form of replay with provenance capture, as an efficient and non-invasive method for computing this type of provenance. However, this approach is limited to the snapshot isolation (SI) concurrency control protocol while many real world applications apply the read committed version of snapshot isolation (RC-SI) to improve performance at the cost of consistency. We present non trivial extensions of the model and reenactment approach to be able to compute provenance of RC-SI transactions efficiently. In addition, we develop techniques for applying reenactment across multiple RC-SI transactions. Our experiments demonstrate that our implementation in the GProM system supports efficient re-construction and querying of provenance.
We present ReproZip, the recommended packaging tool for the SIGMOD Reproducibility Review. ReproZip was designed to simplify the process of making an existing computational experiment reproducible across platforms, even when the experiment was put together without reproducibility in mind. The tool creates a self-contained package for an experiment by automatically tracking and identifying all its required dependencies. The researcher can share the package with others, who can then use ReproZip to unpack the experiment, reproduce the findings on their favorite operating system, as well as modify the original experiment for reuse in new research, all with little effort. The demo will consist of examples of non-trivial experiments, showing how these can be packed in a Linux machine and reproduced on different machines and operating systems. Demo visitors will also be able to pack and reproduce their own experiments.
Many mobile services consist of two components: a server providing an API, and an application running on smartphones and communicating with the API. An unresolved problem in this design is that it is difficult for the server to authenticate which app is accessing the API. This causes many security problems. For example, the provider of a private network API has to embed secrets in its official app to ensure that only this app can access the API; however, attackers can uncover the secret by reverse-engineering. As another example, malicious apps may send automatic requests to ad servers to commit ad fraud. In this work, we propose a system that allows network API to authenticate the mobile app that sends each request so that the API can make an informed access control decision. Our system, the Mobile Trusted-Origin Policy, consists of two parts: 1) an app provenance mechanism that annotates outgoing HTTP(S) requests with information about which app generated the network traffic, and 2) a code isolation mechanism that separates code within an app that should have different app provenance signatures into mobile origin. As motivation for our work, we present two previously-unknown families of apps that perform click fraud, and examine how the lack of mobile origin information enables the attacks. Based on our observations, we propose Trusted Cross-Origin Requests to handle point (1), which automatically includes mobile origin information in outgoing HTTP requests. Servers may then decide, based on the mobile origin data, whether to process the request or not. We implement a prototype of our system for Android and evaluate its performance, security, and deployability. We find that our system can achieve our security and utility goals with negligible overhead.
Defenders of enterprise networks have a critical need to quickly identify the root causes of malware and data leakage. Increasingly, USB storage devices are the media of choice for data exfiltration, malware propagation, and even cyber-warfare. We observe that a critical aspect of explaining and preventing such attacks is understanding the provenance of data (i.e., the lineage of data from its creation to current state) on USB devices as a means of ensuring their safe usage. Unfortunately, provenance tracking is not offered by even sophisticated modern devices. This work presents ProvUSB, an architecture for fine-grained provenance collection and tracking on smart USB devices. ProvUSB maintains data provenance by recording reads and writes at the block layer and reliably identifying hosts editing those blocks through attestation over the USB channel. Our evaluation finds that ProvUSB imposes a one-time 850 ms overhead during USB enumeration, but approaches nearly-bare-metal runtime performance (90% of throughput) on larger files during normal execution, and less than 0.1% storage overhead for provenance in real-world workloads. ProvUSB thus provides essential new techniques in the defense of computer systems and USB storage devices.
As digital objects become increasingly important in people's lives, people may need to understand the provenance, or lineage and history, of an important digital object, to understand how it was produced. This is particularly important for objects created from large, multi-source collections of personal data. As the metadata describing provenance, Provenance Data, is commonly represented as a labelled directed acyclic graph, the challenge is to create effective interfaces onto such graphs so that people can understand the provenance of key digital objects. This unsolved problem is especially challenging for the case of novice and intermittent users and complex provenance graphs. We tackle this by creating an interface based on a clustering approach. This was designed to enable users to view provenance graphs, and to simplify complex graphs by combining several nodes. Our core contribution is the design of a prototype interface that supports clustering and its analytic evaluation in terms of desirable properties of visualisation interfaces.
Provenance workflows capture movement and transformation of data in complex environments, such as document management in large organizations, content generation and sharing in in social media, scientific computations, etc. Sharing and processing of provenance workflows brings numerous benefits, e.g., improving productivity in an organization, understanding social media interaction patterns, etc. However, directly sharing provenance may also disclose sensitive information such as confidential business practices, or private details about participants in a social network. We propose an algorithm that privately extracts sequential association rules from provenance workflow datasets. Finding such rules has numerous practical applications, such as capacity planning or identifying hot-spots in provenance graphs. Our approach provides good accuracy and strong privacy, by leveraging on the exponential mechanism of differential privacy. We propose an heuristic that identifies promising candidate rules and makes judicious use of the privacy budget. Experimental results show that the our approach is fast and accurate, and clearly outperforms the state-of-the-art. We also identify influential factors in improving accuracy, which helps in choosing promising directions for future improvement.
In this paper, we propose a new approach to diagnosing problems in complex distributed systems. Our approach is based on the insight that many of the trickiest problems are anomalies. For instance, in a network, problems often affect only a small fraction of the traffic (e.g., perhaps a certain subnet), or they only manifest infrequently. Thus, it is quite common for the operator to have “examples” of both working and non-working traffic readily available – perhaps a packet that was misrouted, and a similar packet that was routed correctly. In this case, the cause of the problem is likely to be wherever the two packets were treated differently by the network. We present the design of a debugger that can leverage this information using a novel concept that we call differential provenance. Differential provenance tracks the causal connections between network states and state changes, just like classical provenance, but it can additionally perform root-cause analysis by reasoning about the differences between two provenance trees. We have built a diagnostic tool that is based on differential provenance, and we have used our tool to debug a number of complex, realistic problems in two scenarios: software-defined networks and MapReduce jobs. Our results show that differential provenance can be maintained at relatively low cost, and that it can deliver very precise diagnostic information; in many cases, it can even identify the precise root cause of the problem.
Collecting and processing provenance, i.e., information describing the production process of some end product, is important in various applications, e.g., to assess quality, to ensure reproducibility, or to reinforce trust in the end product. In the past, different types of provenance meta-data have been proposed, each with a different scope. The first part of the proposed tutorial provides an overview and comparison of these different types of provenance. To put provenance to good use, it is essential to be able to interact with and present provenance data in a user-friendly way. Often, users interested in provenance are not necessarily experts in databases or query languages, as they are typically domain experts of the product and production process for which provenance is collected (biologists, journalists, etc.). Furthermore, in some scenarios, it is difficult to use solely queries for analyzing and exploring provenance data. The second part of this tutorial therefore focuses on enabling users to leverage provenance through adapted visualizations. To this end, we will present some fundamental concepts of visualization before we discuss possible visualizations for provenance.
Intrusion detection systems need to be both accurate and fast. Speed is important especially when operating at the network level. Additionally, many intrusion detection systems rely on signature based detection approaches. However, machine learning can also be helpful for intrusion detection. One key challenge when using machine learning, aside from the detection accuracy, is using machine learning algorithms that are fast. In this paper, several processing architectures are considered for use in machine learning based intrusion detection systems. These architectures include standard CPUs, GPUs, and cognitive processors. Results of their processing speeds are compared and discussed.
In recent years, we have seen a number of successful attacks against high-profile targets, some of which have even caused severe physical damage. These examples have shown us that resourceful and determined attackers can penetrate virtually any system, even those that are secured by the "air-gap." Consequently, in order to minimize the impact of stealthy attacks, defenders have to focus not only on strengthening the first lines of defense but also on deploying effective intrusion-detection systems. Intrusion-detection systems can play a key role in protecting sensitive computer systems since they give defenders a chance to detect and mitigate attacks before they could cause substantial losses. However, an over-sensitive intrusion-detection system, which produces a large number of false alarms, imposes prohibitively high operational costs on a defender since alarms need to be manually investigated. Thus, defenders have to strike the right balance between maximizing security and minimizing costs. Optimizing the sensitivity of intrusion detection systems is especially challenging in the case when multiple inter-dependent computer systems have to be defended against a strategic attacker, who can target computer systems in order to maximize losses and minimize the probability of detection. We model this scenario as an attacker-defender security game and study the problem of finding optimal intrusion detection thresholds.
With the developing of Internet, network intrusion has become more and more common. Quickly identifying and preventing network attacks is getting increasingly more important and difficult. Machine learning techniques have already proven to be robust methods in detecting malicious activities and network threats. Ensemble-based and semi-supervised learning methods are some of the areas that receive most attention in machine learning today. However relatively little attention has been given in combining these methods. To overcome such limitations, this paper proposes a novel network anomaly detection method by using a combination of a tri-training approach with Adaboost algorithms. The bootstrap samples of tri-training are replaced by three different Adaboost algorithms to create the diversity. We run 30 iteration for every simulation to obtain the average results. Simulations indicate that our proposed semi-supervised Adaboost algorithm is reproducible and consistent over a different number of runs. It outperforms other state-of-the-art learning algorithms, even with a small part of labeled data in the training phase. Specifically, it has a very short execution time and a good balance between the detection rate as well as the false-alarm rate.
Traffic of Industrial Control System (ICS) between the Human Machine Interface (HMI) and the Programmable Logic Controller (PLC) is highly periodic. However, it is sometimes multiplexed, due to multi-threaded scheduling. In previous work we introduced a Statechart model which includes multiple Deterministic Finite Automata (DFA), one per cyclic pattern. We demonstrated that Statechart-based anomaly detection is highly effective on multiplexed cyclic traffic when the individual cyclic patterns are known. The challenge is to construct the Statechart, by unsupervised learning, from a captured trace of the multiplexed traffic, especially when the same symbols (ICS messages) can appear in multiple cycles, or multiple times in a cycle. Previously we suggested a combinatorial approach for the Statechart construction, based on Euler cycles in the Discrete Time Markov Chain (DTMC) graph of the trace. This combinatorial approach worked well in simple scenarios, but produced a false-alarm rate that was excessive on more complex multiplexed traffic. In this paper we suggest a new Statechart construction method, based on spectral analysis. We use the Fourier transform to identify the dominant periods in the trace. Our algorithm then associates a set of symbols with each dominant period, identifies the order of the symbols within each period, and creates the cyclic DFAs and the Statechart. We evaluated our solution on long traces from two production ICS: one using the Siemens S7-0x72 protocol and the other using Modbus. We also stress-tested our algorithms on a collection of synthetically-generated traces that simulate multiplexed ICS traces with varying levels of symbol uniqueness and time overlap. The resulting Statecharts model the traces with an overall median false-alarm rate as low as 0.16% on the synthetic datasets, and with zero false-alarms on production S7-0x72 traffic. Moreover, the spectral analysis Statecharts consistently out-performed the previous combinatorial Statecharts, exhibiting significantly lower false alarm rates and more compact model sizes.
A major challenge of the existing attack detection approaches is the identification of relevant information to a particular situation, and the use of such information to perform multi-evidence intrusion detection. Addressing such a limitation requires integrating several aspects of context to better predict, avoid and respond to impending attacks. The quality and adequacy of contextual information is important to decrease uncertainty and correctly identify potential cyber-attacks. In this paper, a systematic methodology has been used to identify contextual dimensions that improve the effectiveness of detecting cyber-attacks. This methodology combines graph, probability, and information theories to create several context-based attack prediction models that analyze data at a high- and low-level. An extensive validation of our approach has been performed using a prototype system and several benchmark intrusion detection datasets yielding very promising results.
Radio Frequency Identification (RFID) technology has been applied in many fields, such as tracking product through the supply chains, electronic passport (ePassport), proximity card, etc. Most companies will choose low-cost RFID tags. However, these RFID tags are almost no security mechanism so that criminals can easily clone these tags and get the user permissions. In this paper, we aim at more efficient detection proximity card be cloned and design a real-time intrusion detection system based on one tool of Complex Event Processing (Esper) in the RFID middleware. We will detect the cloned tags through training our system with the user's habits. When detected anomalous behavior which may clone tags have occurred, and then send the notification to user. We discuss the reliability of this intrusion detection system and describes in detail how to work.
In this paper, we propose a hierarchical monitoring intrusion detection system (HAMIDS) for industrial control systems (ICS). The HAMIDS framework detects the anomalies in both level 0 and level 1 of an industrial control plant. In addition, the framework aggregates the cyber-physical process data in one point for further analysis as part of the intrusion detection process. The novelty of this framework is its ability to detect anomalies that have a distributed impact on the cyber-physical process. The performance of the proposed framework evaluated as part of SWaT security showdown (S3) in which six international teams were invited to test the framework in a real industrial control system. The proposed framework outperformed other proposed academic IDS in term of detection of ICS threats during the S3 event, which was held from July 25-29, 2016 at Singapore University of Technology and Design.
Humans can easily find themselves in high cost situations where they must choose between suggestions made by an automated decision aid and a conflicting human decision aid. Previous research indicates that humans often rely on automation or other humans, but not both simultaneously. Expanding on previous work conducted by Lyons and Stokes (2012), the current experiment measures how trust in automated or human decision aids differs along with perceived risk and workload. The simulated task required 126 participants to choose the safest route for a military convoy; they were presented with conflicting information from an automated tool and a human. Results demonstrated that as workload increased, trust in automation decreased. As the perceived risk increased, trust in the human decision aid increased. Individual differences in dispositional trust correlated with an increased trust in both decision aids. These findings can be used to inform training programs for operators who may receive information from human and automated sources. Examples of this context include: air traffic control, aviation, and signals intelligence.
This paper builds on prior psychological studies that identify signals of trustworthiness between two human negotiators. Unlike prior work, the current work tracks such signals automatically and fuses them into computational models that predict trustworthiness. To achieve this goal, we apply automatic trackers to recordings of human dyads negotiating in a multi-issue bargaining task. We identify behavioral indicators in different modalities (facial expressions, gestures, gaze, and conversational features) that are predictive of trustworthiness. We predict both objective trustworthiness (i.e., are they honest) and perceived trustworthiness (i.e., do they seem honest to their interaction partner). Our experiments show that people are poor judges of objective trustworthiness (i.e., objective and perceived trustworthiness are predicted by different indicators), and that multimodal approaches better predict objective trustworthiness, whereas people overly rely on facial expressions when judging the honesty of their partner. Moreover, domain knowledge (from the literature and prior analysis of behaviors) facilitates the model development process.
This study was conducted to determine whether monitoring moderated the impact of trust on the project performance of 57 virtual teams. Two sources of monitoring were examined: internal monitoring done by team members and external monitoring done by someone outside of the team. Two types of trust were also examined: affective-based trust, or trust based on emotion; and cognitive trust, or trust based on competency. Results indicate that when internal monitoring was high, affective trust was associated with increases in performance. However, affective trust was associated with decreases in performance when external monitoring was high. Both types of monitoring reduced the strong positive relationship between cognitive trust and the performance of virtual teams. Results of this study provide new insights about monitoring and trust in virtual teams and inform both theory and design.
The rising prevalence of algorithmic interfaces, such as curated feeds in online news, raises new questions for designers, scholars, and critics of media. This work focuses on how transparent design of algorithmic interfaces can promote awareness and foster trust. A two-stage process of how transparency affects trust was hypothesized drawing on theories of information processing and procedural justice. In an online field experiment, three levels of system transparency were tested in the high-stakes context of peer assessment. Individuals whose expectations were violated (by receiving a lower grade than expected) trusted the system less, unless the grading algorithm was made more transparent through explanation. However, providing too much information eroded this trust. Attitudes of individuals whose expectations were met did not vary with transparency. Results are discussed in terms of a dual process model of attitude change and the depth of justification of perceived inconsistency. Designing for trust requires balanced interface transparency - not too little and not too much.
Reputation systems in current electronic marketplaces can easily be manipulated by malicious sellers in order to appear more reputable than appropriate. We conducted a controlled experiment with 40 UK and 41 German participants on their ability to detect malicious behavior by means of an eBay-like feedback profile versus a novel interface involving an interactive visualization of reputation data. The results show that participants using the new interface could better detect and understand malicious behavior in three out of four attacks (the overall detection accuracy 77% in the new vs. 56% in the old interface). Moreover, with the new interface, only 7% of the users decided to buy from the malicious seller (the options being to buy from one of the available sellers or to abstain from buying), as opposed to 30% in the old interface condition.
Building trust among remote developers is challenging because trust typically grows through close face-to-face interaction. In this paper, we present the preparatory design of an empirical study aimed to assess whether affective trust, established through social communication between developers, is a predictor of successful collaboration in distributed projects. Specifically, we intend to measure affective trust through sentiment analysis of pull-request comments.
Recently, PIN-TRUST, a method to predict future trust relationships between users is proposed. PIN-TRUST out-performs existing trust prediction methods by exploiting all types of interactions between users and the reciprocation of ones. In this paper, we validate whether its consideration on the reciprocation of interactions is really effective in trust prediction. Furthermore, we consider a new concept, the "uncertainty" of untrustworthy users that is devised to reflect the difficulty on modeling the activities of untrustworthy users in PIN-TRUST. Then, we also validate the effectiveness this uncertainty concepts. Through the validation, we reveal that the consideration of the reciprocation of interactions is effective for trust prediction with PIN-TRUST, and it is necessary to regard the uncertainty of untrustworthy users same as that of other users.
Human-machine trust is a critical mitigating factor in many HCI instances. Lack of trust in a system can lead to system disuse whilst over-trust can lead to inappropriate use. Whilst human-machine trust has been examined extensively from within a technico-social framework, few efforts have been made to link the dynamics of trust within a steady-state operator-machine environment to the existing literature of the psychology of learning. We set out to recreate a commonly reported learning phenomenon within a trust acquisition environment: Users learning which algorithms can and cannot be trusted to reduce traffic in a city. We failed to replicate (after repeated efforts) the learning phenomena of "blocking", resulting in a finding that people consistently make a very specific error in trust assignment to cues in conditions of uncertainty. This error can be seen as a cognitive bias and has important implications for HCI.