Biblio
The malware detection arms race involves constant change: malware changes to evade detection and labels change as detection mechanisms react. Recognizing that malware changes over time, prior work has enforced temporally consistent samples by requiring that training binaries predate evaluation binaries. We present temporally consistent labels, requiring that training labels also predate evaluation binaries since training labels collected after evaluation binaries constitute label knowledge from the future. Using a dataset containing 1.1 million binaries from over 2.5 years, we show that enforcing temporal label consistency decreases detection from 91% to 72% at a 0.5% false positive rate compared to temporal samples alone.
The impact of temporal labeling demonstrates the potential of improved labels to increase detection results. Hence, we present a detector capable of selecting binaries for submission to an expert labeler for review. At a 0.5% false positive rate, our detector achieves a 72% true positive rate without an expert, which increases to 77% and 89% with 10 and 80 expert queries daily, respectively. Additionally, we detect 42% of malicious binaries initially undetected by all 32 antivirus vendors from VirusTotal used in our evaluation. For evaluation at scale, we simulate the human expert labeler and show that our approach is robust against expert labeling errors. Our novel contributions include a scalable malware detector integrating manual review with machine learning and the examination of temporal label consistency
Cyber-Physical Systems (CPS) are systems with seamless integration of physical, computational and networking components. These systems can potentially have an impact on the physical components, hence it is critical to safeguard them against a wide range of attacks. In this paper, it is argued that an effective approach to achieve this goal is to systematically identify the potential threats at the design phase of building such systems, commonly achieved via threat modeling. In this context, a tool to perform systematic analysis of threat modeling for CPS is proposed. A real-world wireless railway temperature monitoring system is used as a case study to validate the proposed approach. The threats identified in the system are subsequently mitigated using National Institute of Standards and Technology (NIST) standards.
Stackelberg security games have been widely deployed in recent years to schedule security resources. An assumption in most existing security game models is that one security resource assigned to a target only protects that target. However, in many important real-world security scenarios, when a resource is assigned to a target, it exhibits protection externalities: that is, it also protects other "neighbouring" targets. We investigate such Security Games with Protection Externalities (SPEs). First, we demonstrate that computing a strong Stackelberg equilibrium for an SPE is NP-hard, in contrast with traditional Stackelberg security games which can be solved in polynomial time. On the positive side, we propose a novel column generation based approach—CLASPE—to solve SPEs. CLASPE features the following novelties: 1) a novel mixed-integer linear programming formulation for the slave problem; 2) an extended greedy approach with a constant-factor approximation ratio to speed up the slave problem; and 3) a linear-scale linear programming that efficiently calculates the upper bounds of target-defined subproblems for pruning. Our experimental evaluation demonstrates that CLASPE enable us to scale to realistic-sized SPE problem instances.
In this paper, we formulate a three-player three-stage Colonel Blotto game, in which two players fight against a common adversary. We assume that the game is one of complete information, that is, the players have complete and consistent information on the underlying model of the game; further, each player observes the actions taken by all players up to the previous stage. The setting under consideration is similar to the one considered in our recent work [1], but with a different information structure during the second stage of the game; this leads to a significantly different solution.
In the first stage, players can add additional battlefields. In the second stage, the players (except the adversary) are allowed to transfer resources among each other if it improves their expected payoffs, and simultaneously, the adversary decides on the amount of resource it allocates to the battle with each player subject to its resource constraint. At the third stage, the players and the adversary fight against each other with updated resource levels and battlefields. We compute the subgame-perfect Nash equilibrium for this game. Further, we show that when playing according to the equilibrium, there are parameter regions in which (i) there is a net positive transfer, (ii) there is absolutely no transfer, (iii) the adversary fights with only one player, and (iv) adding battlefields is beneficial to a player. In doing so, we also exhibit a counter-intuitive property of Nash equilibrium in games: extra information to a player in the game does not necessarily lead to a better performance for that player. The result finds application in resource allocation problems for securing cyber-physical systems.
We consider a three-step three-player complete information Colonel Blotto game in this paper, in which the first two players fight against a common adversary. Each player is endowed with a certain amount of resources at the beginning of the game, and the number of battlefields on which a player and the adversary fights is specified. The first two players are allowed to form a coalition if it improves their payoffs. In the first stage, the first two players may add battlefields and incur costs. In the second stage, the first two players may transfer resources among each other. The adversary observes this transfer, and decides on the allocation of its resources to the two battles with the players. At the third step, the adversary and the other two players fight on the updated number of battlefields and receive payoffs. We characterize the subgame-perfect Nash equilibrium (SPNE) of the game in various parameter regions. In particular, we show that there are certain parameter regions in which if the players act according to the SPNE strategies, then (i) one of the first two players add battlefields and transfer resources to the other player (a coalition is formed), (ii) there is no addition of battlefields and no transfer of resources (no coalition is formed). We discuss the implications of the results on resource allocation for securing cyberphysical systems.
Abstract—Actor programming languages provide the kind of inherent parallelism that is needed for building applications in the mobile cloud. This is because the Actor model provides encapsulation (isolation of local state), fair scheduling, location transparency, and locality of reference. These properties facilitate building secure, scalable concurrent systems. Not surprisingly, very large-scale applications such as Facebook chat service and Twitter have been written in actor languages. The paper introduces the basics of the actor model and gives a high-level overview of the problem of coordination in actor systems. It then describes several novel methods for reasoning about concurrent systems that are both effective and scalable.
In this lecture, I will focus on an alternate method for addressing the problem of large state spaces. For many purposes, it may not be necessary to consider the global state as a cross-product of the states of individual actors. We take our inspiration from statistical physics where macro properties of a system may be related to the properties of individual molecules using probability distributions on the states of the latter. Consider a simple example. Suppose associated with each state is the amount of energy a node consumes when in that state (such an associated value mapping is called the reward function of the state). Now, if we have a frequency count of the nodes in each state, we can estimate the total energy consumed by the system. This suggests a model where the global state is a vector of probability mass functions (pmfs). In the above example, the size of the vector would be 5, one element for each possible state of a node. Each element of the vector represents the probability that any node is in the particular state corresponding to entry.
This was an invited talk to the 5th International Conference on Algebraic Informatics.
Knowing inputs that cover a specific branch or statement in a program is useful for debugging and regression testing. Symbolic backward execution (SBE) is a natural approach to find such targeted inputs. However, SBE struggles with complicated arithmetic, external method calls, and data-dependent loops that occur in many real-world programs. We propose symcretic execution, a novel combination of SBE and concrete forward execution that can efficiently find targeted inputs despite these challenges. An evaluation of our approach on a range of test cases shows that symcretic execution finds inputs in more cases than concolic testing tools while exploring fewer path segments. Integration of our approach will allow test generation tools to fill coverage gaps and static bug detectors to verify candidate bugs with concrete test cases. This is the full version of an extended abstract that was presented at the 29th IEEE/ACM International Conference on Automated Software Engineering (ASE 2014), September 15–19, 2014, Västerås, Sweden.
Knowing inputs that cover a specific branch or statement in a program is useful for debugging and regression testing. Symbolic backward execution (SBE) is a natural approach to find such targeted inputs. However, SBE struggles with complicated arithmetic, external method calls, and data- dependent loops that occur in many real-world programs. We propose symcretic execution, a novel combination of SBE and concrete forward execution that can efficiently find targeted inputs despite these challenges. An evaluation of our approach on a range of test cases shows that symcretic execution finds inputs in more cases than concolic testing tools while exploring fewer path segments. Integration of our approach will allow test generation tools to fill coverage gaps and static bug detectors to verify candidate bugs with concrete test cases.
Modeling and evaluating the performance of large-scale wireless sensor networks (WSNs) is a challenging problem. The traditional method for representing the global state of a system as a cross product of the states of individual nodes in the system results in a state space whose size is exponential in the number of nodes. We propose an alternative way of representing the global state of a system: namely, as a probability mass function (pmf) which represents the fraction of nodes in different states. A pmf corresponds to a point in a Euclidean space of possible pmf values, and the evolution of the state of a system is represented by trajectories in this Euclidean space. We propose a novel performance evaluation method that examines all pmf trajectories in a dense Euclidean space by exploring only finite relevant portions of the space. We call our method Euclidean model checking. Euclidean model checking is useful both in the design phase—where it can help determine system parameters based on a specification—and in the evaluation phase—where it can help verify performance properties of a system. We illustrate the utility of Euclidean model checking by using it to design a time difference of arrival (TDoA) distance measurement protocol and to evaluate the protocol’s implementation on a 90-node WSN. To facilitate such performance evaluations, we provide a Markov model estimation method based on applying a standard statistical estimation technique to samples resulting from the execution of a system.
Atomic sets are a synchronization mechanism in which the programmer specifies the groups of data that must be ac- cessed as a unit. The compiler can check this specifica- tion for consistency, detect deadlocks, and automatically add the primitives to prevent interleaved access. Atomic sets relieve the programmer from the burden of recognizing and pruning execution paths which lead to interleaved ac- cess, thereby reducing the potential for data races. However, manually converting programs from lock-based synchroniza- tion to atomic sets requires reasoning about the program’s concurrency structure, which can be a challenge even for small programs. Our analysis eliminates the challenge by automating the reasoning. Our implementation of the anal- ysis allowed us to derive the atomic sets for large code bases such as the Java collections framework in a matter of min- utes. The analysis is based on execution traces; assuming all traces reflect intended behavior, our analysis enables safe concurrency by preventing unobserved interleavings which may harbor latent Heisenbugs.
Session types have been proposed as a means of statically verifying implementations of communication protocols. Although prior work has been successful in verifying some classes of protocols, it does not cope well with parameterized, multi-actor scenarios with inherent asynchrony. For example, the sliding window protocol is inexpressible in previously proposed session type systems. This paper describes System-A, a new typing language which overcomes many of the expressiveness limitations of prior work. System-A explicitly supports asynchrony and parallelism, as well as multiple forms of parameterization. We define System-A and show how it can be used for the static verification of a large class of asynchronous communication protocols.
The concept of differential privacy stems from the study of private query of datasets. In this work, we apply this concept to metric spaces to study a mechanism that randomizes a deterministic query by adding mean-zero noise to keep differential privacy.
We address a discrete-time LQG control problem over a fixed performance window and apply a receding-horizon type control strategy, resulting in an exact solution to the problem in terms of semidefinite programming. The systems considered take parameters from a finite set, and switch between them according to an automaton. The controller has a finite preview of future parameters, beyond which only the set of parameters is known. We provide necessary and sufficient convex con- ditions for the existence of a controller which guarantees both exponential stability and finite-horizon performance levels for the system; the performance levels may differ according to the particular parameter sequence within the performance window. A simple, physics-based example is provided to illustrate the main results.
This paper is concerned with mean-square stabilization of single-input Markovian jump linear systems (MJLSs) with logarithmically quantized state feedback. We introduce the concepts and provide explicit constructions of stabilizing mode-dependent logarithmic quantizers together with associated controllers, and a semi-convex way to determine the optimal (coarsest) stabilizing quantization density. An example application is presented as a special case of the developed framework, that of feedback stabilizing a linear time-invariant (LTI) system over a log-quantized erasure channel. A hardware implementation of this application on an inverted pendulum testbed is provided using a finite word-length approximation.
We provide an exact solution to two performance problems—one of disturbance attenuation and one of windowed variance minimization—subject to exponential stability. Considered are switched systems, whose parameters come from a finite set and switch according to a language such as that specified by an automaton. The controllers are path-dependent, having finite memory of past plant parameters and finite foreknowledge of future parameters. Exact, convex synthesis conditions for each performance problem are expressed in terms of nested linear matrix inequalities. The resulting semidefinite programming problem may be solved offline to arrive at a suitable controller. A notion of path-by-path performance is introduced for each performance problem, leading to improved system performance. Non-regular switching languages are considered and the results are extended to these languages. Two simple, physically motivated examples are given to demonstrate the application of these results.
Individuals sharing information can improve the cost or performance of a distributed control system. But, sharing may also violate privacy. We develop a general framework for studying the cost of differential privacy in systems where a collection of agents, with coupled dynamics, communicate for sensing their shared environment while pursuing individ- ual preferences. First, we propose a communication strategy that relies on adding carefully chosen random noise to agent states and show that it preserves differential privacy. Of course, the higher the standard deviation of the noise, the higher the cost of privacy. For linear distributed control systems with quadratic cost functions, the standard deviation becomes independent of the number agents and it decays with the maximum eigenvalue of the dynamics matrix. Furthermore, for stable dynamics, the noise to be added is independent of the number of agents as well as the time horizon up to which privacy is desired.
The iterative consensus problem requires a set of processes or agents with different initial values, to interact and update their states to eventually converge to a common value. Pro- tocols solving iterative consensus serve as building blocks in a variety of systems where distributed coordination is re- quired for load balancing, data aggregation, sensor fusion, filtering, and synchronization. In this paper, we introduce the private iterative consensus problem where agents are re- quired to converge while protecting the privacy of their ini- tial values from honest but curious adversaries. Protecting the initial states, in many applications, suffice to protect all subsequent states of the individual participants.
We adapt the notion of differential privacy in this setting of iterative computation. Next, we present (i) a server-based and (ii) a completely distributed randomized mechanism for solving differentially private iterative consensus with adver- saries who can observe the messages as well as the internal states of the server and a subset of the clients. Our analysis establishes the tradeoff between privacy and the accuracy.
We present a controller synthesis algorithm for a discrete time reach-avoid problem in the presence of adversaries. Our model of the adversary captures typical malicious attacks en- visioned on cyber-physical systems such as sensor spoofing, controller corruption, and actuator intrusion. After formu- lating the problem in a general setting, we present a sound and complete algorithm for the case with linear dynamics and an adversary with a budget on the total L2-norm of its actions. The algorithm relies on a result from linear control theory that enables us to decompose and precisely compute the reachable states of the system in terms of a symbolic simulation of the adversary-free dynamics and the total uncertainty induced by the adversary. With this de- composition, the synthesis problem eliminates the universal quantifier on the adversary’s choices and the symbolic con- troller actions can be effectively solved using an SMT solver. The constraints induced by the adversary are computed by solving second-order cone programmings. The algorithm is later extended to synthesize state-dependent controller and to generate attacks for the adversary. We present prelimi- nary experimental results that show the effectiveness of this approach on several example problems.
Hypervisor activity is designed to be hidden from guest Virtual Machines (VM) as well as external observers. In this paper, we demonstrate that this does not always occur. We present a method by which an external observer can learn sensitive information about hypervisor internals, such as VM scheduling or hypervisor-level monitoring schemes, by observing a VM. We refer to this capability as Hypervisor Introspection (HI).
HI can be viewed as the inverse process of the well-known Virtual Machine Introspection (VMI) technique. VMI is a technique to extract VMs’ internal state from the hypervi- sor, facilitating the implementation of reliability and security monitors[1]. Conversely, HI is a technique that allows VMs to autonomously extract hypervisor information. This capability enables a wide range of attacks, for example, learning a hypervisor’s properties (version, configuration, etc.), defeating hypervisor-level monitoring systems, and compromising the confidentiality of co-resident VMs. This paper focuses on the discovery of a channel to implement HI, and then leveraging that channel for a novel attack against traditional VMI.
In order to perform HI, there must be a method of extracting information from the hypervisor. Since this information is intentionally hidden from a VM, we make use of a side channel. When the hypervisor checks a VM using VMI, VM execution (e.g. network communication between a VM and a remote system) must pause. Therefore, information regarding the hypervisor’s activity can be leaked through this suspension of execution. We call this side channel the VM suspend side channel, illustrated in Fig. 1. As a proof of concept, this paper presents how correlating the results of in-VM micro- benchmarking and out-of-VM reference monitoring can be used to determine when hypervisor-level monitoring tools are vulnerable to attacks.
Care of chronic cardiac patients requires information interchange between patients' homes, clinical environments, and the electronic health record. Standards are emerging to support clinical information collection, exchange and management and to overcome information fragmentation and actors delocalization. Heterogeneity of information sources at patients' homes calls for open solutions to collect and accommodate multidomain information, including environmental data. Based on the experience gained in a European Research Program, this paper presents an integrated and open approach for clinical data interchange in cardiac telemonitoring applications. This interchange is supported by the use of standards following the indications provided by the national authorities of the countries involved. Taking into account the requirements provided by the medical staff involved in the project, the authors designed and implemented a prototypal middleware, based on a service-oriented architecture approach, to give a structured and robust tool to congestive heart failure patients for their personalized telemonitoring. The middleware is represented by a health record management service, whose interface is compliant to the healthcare services specification project Retrieve, Locate and Update Service standard (Level 0), which allows communication between the agents involved through the exchange of Clinical Document Architecture Release 2 documents. Three performance tests were carried out and showed that the prototype completely fulfilled all requirements indicated by the medical staff; however, certain aspects, such as authentication, security and scalability, should be deeply analyzed within a future engineering phase.
In this paper, we propose SAFE (Security Aware FlexRay scheduling Engine), to provide a problem definition and a design framework for FlexRay static segment schedule to address the new challenge on security. From a high level specification of the application, the architecture and communication middleware are synthesized to satisfy security requirements, in addition to extensibility, costs, and end-to-end latencies. The proposed design process is applied to two industrial case studies consisting of a set of active safety functions and an X-by-wire system respectively.
The integration of social networking concepts into the Internet of things has led to the Social Internet of Things (SIoT) paradigm, according to which objects are capable of establishing social relationships in an autonomous way with respect to their owners with the benefits of improving the network scalability in information/service discovery. Within this scenario, we focus on the problem of understanding how the information provided by members of the social IoT has to be processed so as to build a reliable system on the basis of the behavior of the objects. We define two models for trustworthiness management starting from the solutions proposed for P2P and social networks. In the subjective model each node computes the trustworthiness of its friends on the basis of its own experience and on the opinion of the friends in common with the potential service providers. In the objective model, the information about each node is distributed and stored making use of a distributed hash table structure so that any node can make use of the same information. Simulations show how the proposed models can effectively isolate almost any malicious nodes in the network at the expenses of an increase in the network traffic for feedback exchange.
Cyber intrusions to substations of a power grid are a source of vulnerability since most substations are unmanned and with limited protection of the physical security. In the worst case, simultaneous intrusions into multiple substations can lead to severe cascading events, causing catastrophic power outages. In this paper, an integrated Anomaly Detection System (ADS) is proposed which contains host- and network-based anomaly detection systems for the substations, and simultaneous anomaly detection for multiple substations. Potential scenarios of simultaneous intrusions into the substations have been simulated using a substation automation testbed. The host-based anomaly detection considers temporal anomalies in the substation facilities, e.g., user-interfaces, Intelligent Electronic Devices (IEDs) and circuit breakers. The malicious behaviors of substation automation based on multicast messages, e.g., Generic Object Oriented Substation Event (GOOSE) and Sampled Measured Value (SMV), are incorporated in the proposed network-based anomaly detection. The proposed simultaneous intrusion detection method is able to identify the same type of attacks at multiple substations and their locations. The result is a new integrated tool for detection and mitigation of cyber intrusions at a single substation or multiple substations of a power grid.
Mobile Ad-Hoc Networks (MANET) consist of peer-to-peer infrastructure less communicating nodes that are highly dynamic. As a result, routing data becomes more challenging. Ultimately routing protocols for such networks face the challenges of random topology change, nature of the link (symmetric or asymmetric) and power requirement during data transmission. Under such circumstances both, proactive as well as reactive routing are usually inefficient. We consider, zone routing protocol (ZRP) that adds the qualities of the proactive (IARP) and reactive (IERP) protocols. In ZRP, an updated topological map of zone centered on each node, is maintained. Immediate routes are available inside each zone. In order to communicate outside a zone, a route discovery mechanism is employed. The local routing information of the zones helps in this route discovery procedure. In MANET security is always an issue. It is possible that a node can turn malicious and hamper the normal flow of packets in the MANET. In order to overcome such issue we have used a clustering technique to separate the nodes having intrusive behavior from normal behavior. We call this technique as effective k-means clustering which has been motivated from k-means. We propose to implement Intrusion Detection System on each node of the MANET which is using ZRP for packet flow. Then we will use effective k-means to separate the malicious nodes from the network. Thus, our Ad-Hoc network will be free from any malicious activity and normal flow of packets will be possible.