Visible to the public Biblio

Filters: Author is Vorobeychik, Yevgeniy  [Clear All Filters]
2018-11-28
Li, Bo, Roundy, Kevin, Gates, Chris, Vorobeychik, Yevgeniy.  2017.  Large-Scale Identification of Malicious Singleton Files. Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy. :227–238.

We study a dataset of billions of program binary files that appeared on 100 million computers over the course of 12 months, discovering that 94% of these files were present on a single machine. Though malware polymorphism is one cause for the large number of singleton files, additional factors also contribute to polymorphism, given that the ratio of benign to malicious singleton files is 80:1. The huge number of benign singletons makes it challenging to reliably identify the minority of malicious singletons. We present a large-scale study of the properties, characteristics, and distribution of benign and malicious singleton files. We leverage the insights from this study to build a classifier based purely on static features to identify 92% of the remaining malicious singletons at a 1.4% percent false positive rate, despite heavy use of obfuscation and packing techniques by most malicious singleton files that we make no attempt to de-obfuscate. Finally, we demonstrate robustness of our classifier to important classes of automated evasion attacks.

2018-08-23
Laszka, Aron, Abbas, Waseem, Vorobeychik, Yevgeniy, Koutsoukos, Xenofon.  2017.  Synergic Security for Smart Water Networks: Redundancy, Diversity, and Hardening. Proceedings of the 3rd International Workshop on Cyber-Physical Systems for Smart Water Networks. :21–24.

Smart water networks can provide great benefits to our society in terms of efficiency and sustainability. However, smart capabilities and connectivity also expose these systems to a wide range of cyber attacks, which enable cyber-terrorists and hostile nation states to mount cyber-physical attacks. Cyber-physical attacks against critical infrastructure, such as water treatment and distribution systems, pose a serious threat to public safety and health. Consequently, it is imperative that we improve the resilience of smart water networks. We consider three approaches for improving resilience: redundancy, diversity, and hardening. Even though each one of these "canonical" approaches has been throughly studied in prior work, a unified theory on how to combine them in the most efficient way has not yet been established. In this paper, we address this problem by studying the synergy of these approaches in the context of protecting smart water networks from cyber-physical contamination attacks.

2018-07-06
Liu, Chang, Li, Bo, Vorobeychik, Yevgeniy, Oprea, Alina.  2017.  Robust Linear Regression Against Training Data Poisoning. Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. :91–102.
The effectiveness of supervised learning techniques has made them ubiquitous in research and practice. In high-dimensional settings, supervised learning commonly relies on dimensionality reduction to improve performance and identify the most important factors in predicting outcomes. However, the economic importance of learning has made it a natural target for adversarial manipulation of training data, which we term poisoning attacks. Prior approaches to dealing with robust supervised learning rely on strong assumptions about the nature of the feature matrix, such as feature independence and sub-Gaussian noise with low variance. We propose an integrated method for robust regression that relaxes these assumptions, assuming only that the feature matrix can be well approximated by a low-rank matrix. Our techniques integrate improved robust low-rank matrix approximation and robust principle component regression, and yield strong performance guarantees. Moreover, we experimentally show that our methods significantly outperform state of the art both in running time and prediction error.
2017-05-16
Laszka, Aron, Abbas, Waseem, Sastry, S. Shankar, Vorobeychik, Yevgeniy, Koutsoukos, Xenofon.  2016.  Optimal Thresholds for Intrusion Detection Systems. Proceedings of the Symposium and Bootcamp on the Science of Security. :72–81.

In recent years, we have seen a number of successful attacks against high-profile targets, some of which have even caused severe physical damage. These examples have shown us that resourceful and determined attackers can penetrate virtually any system, even those that are secured by the "air-gap." Consequently, in order to minimize the impact of stealthy attacks, defenders have to focus not only on strengthening the first lines of defense but also on deploying effective intrusion-detection systems. Intrusion-detection systems can play a key role in protecting sensitive computer systems since they give defenders a chance to detect and mitigate attacks before they could cause substantial losses. However, an over-sensitive intrusion-detection system, which produces a large number of false alarms, imposes prohibitively high operational costs on a defender since alarms need to be manually investigated. Thus, defenders have to strike the right balance between maximizing security and minimizing costs. Optimizing the sensitivity of intrusion detection systems is especially challenging in the case when multiple inter-dependent computer systems have to be defended against a strategic attacker, who can target computer systems in order to maximize losses and minimize the probability of detection. We model this scenario as an attacker-defender security game and study the problem of finding optimal intrusion detection thresholds.

2017-04-24
Neema, Himanshu, Volgyesi, Peter, Potteiger, Bradley, Emfinger, William, Koutsoukos, Xenofon, Karsai, Gabor, Vorobeychik, Yevgeniy, Sztipanovits, Janos.  2016.  SURE: An Experimentation and Evaluation Testbed for CPS Security and Resilience: Demo Abstract. Proceedings of the 7th International Conference on Cyber-Physical Systems. :27:1–27:1.

In-depth consideration and evaluation of security and resilience is necessary for developing the scientific foundations and technology of Cyber-Physical Systems (CPS). In this demonstration, we present SURE [1], a CPS experimentation and evaluation testbed for security and resilience focusing on transportation networks. The testbed includes (1) a heterogeneous modeling and simulation integration platform, (2) a Web-based tool for modeling CPS in adversarial environments, and (3) a framework for evaluating resilience using attacker-defender games. Users such as CPS designers and operators can interact with the testbed to evaluate monitoring and control schemes that include sensor placement and traffic signal configuration.

2017-03-29
Lou, Jian, Vorobeychik, Yevgeniy.  2016.  Decentralization and Security in Dynamic Traffic Light Control. Proceedings of the Symposium and Bootcamp on the Science of Security. :90–92.

Complex traffic networks include a number of controlled intersections, and, commonly, multiple districts or municipalities. The result is that the overall traffic control problem is extremely complex computationally. Moreover, given that different municipalities may have distinct, non-aligned, interests, traffic light controller design is inherently decentralized, a consideration that is almost entirely absent from related literature. Both complexity and decentralization have great bearing both on the quality of the traffic network overall, as well as on its security. We consider both of these issues in a dynamic traffic network. First, we propose an effective local search algorithm to efficiently design system-wide control logic for a collection of intersections. Second, we propose a game theoretic (Stackelberg game) model of traffic network security in which an attacker can deploy denial-of-service attacks on sensors, and develop a resilient control algorithm to mitigate such threats. Finally, we propose a game theoretic model of decentralization, and investigate this model both in the context of baseline traffic network design, as well as resilient design accounting for attacks. Our methods are implemented and evaluated using a simple traffic network scenario in SUMO.

2016-04-08
Abbas, Waseem, Laszka, Aron, Vorobeychik, Yevgeniy, Koutsoukos, Xenofon.  2015.  Scheduling Intrusion Detection Systems in Resource-Bounded Cyber-Physical Systems. Proceedings of the First ACM Workshop on Cyber-Physical Systems-Security and/or PrivaCy. :55–66.

In order to be resilient to attacks, a cyber-physical system (CPS) must be able to detect attacks before they can cause significant damage. To achieve this, \emph{intrusion detection systems} (IDS) may be deployed, which can detect attacks and alert human operators, who can then intervene. However, the resource-constrained nature of many CPS poses a challenge, since reliable IDS can be computationally expensive. Consequently, computational nodes may not be able to perform intrusion detection continuously, which means that we have to devise a schedule for performing intrusion detection. While a uniformly random schedule may be optimal in a purely cyber system, an optimal schedule for protecting CPS must also take into account the physical properties of the system, since the set of adversarial actions and their consequences depend on the physical systems. Here, in the context of water distribution networks, we study IDS scheduling problems in two settings and under the constraints on the available battery supplies. In the first problem, the objective is to design, for a given duration of time $T$, scheduling schemes for IDS so that the probability of detecting an attack is maximized within that duration. We propose efficient heuristic algorithms for this general problem and evaluate them on various networks. In the second problem, our objective is to design scheduling schemes for IDS so that the overall lifetime of the network is maximized while ensuring that an intruder attack is always detected. Various strategies to deal with this problem are presented and evaluated for various networks.

2016-04-07
Ke, Liyiming, Li, Bo, Vorobeychik, Yevgeniy.  2016.  Behavioral Experiments in Email Filter Evasion.

Despite decades of effort to combat spam, unwanted and even malicious emails, such as phish which aim to deceive recipients into disclosing sensitive information, still routinely find their way into one’s mailbox. To be sure, email filters manage to stop a large fraction of spam emails from ever reaching users, but spammers and phishers have mastered the art of filter evasion, or manipulating the content of email messages to avoid being filtered. We present a unique behavioral experiment designed to study email filter evasion. Our experiment is framed in somewhat broader terms: given the widespread use of machine learning methods for distinguishing spam and non-spam, we investigate how human subjects manipulate a spam template to evade a classification-based filter. We find that adding a small amount of noise to a filter significantly reduces the ability of subjects to evade it, observing that noise does not merely have a short-term impact, but also degrades evasion performance in the longer term. Moreover, we find that greater coverage of an email template by the classifier (filter) features significantly increases the difficulty of evading it. This observation suggests that aggressive feature reduction—a common practice in applied machine learning—can actually facilitate evasion. In addition to the descriptive analysis of behavior, we develop a synthetic model of human evasion behavior which closely matches observed behavior and effectively replicates experimental findings in simulation.

Laszka, Aron, Vorobeychik, Yevgeniy, Koutsoukos, Xenofon.  2015.  Optimal Personalized Filtering Against Spear-phishing Attacks. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. :958–964.

To penetrate sensitive computer networks, attackers can use spear phishing to sidestep technical security mechanisms by exploiting the privileges of careless users. In order to maximize their success probability, attackers have to target the users that constitute the weakest links of the system. The optimal selection of these target users takes into account both the damage that can be caused by a user and the probability of a malicious e-mail being delivered to and opened by a user. Since attackers select their targets in a strategic way, the optimal mitigation of these attacks requires the defender to also personalize the e-mail filters by taking into account the users' properties.

In this paper, we assume that a learned classifier is given and propose strategic per-user filtering thresholds for mitigating spear-phishing attacks. We formulate the problem of filtering targeted and non-targeted malicious e-mails as a Stackelberg security game. We characterize the optimal filtering strategies and show how to compute them in practice. Finally, we evaluate our results using two real-world datasets and demonstrate that the proposed thresholds lead to lower losses than nonstrategic thresholds.

Gan, Jiarui, An, Bo, Vorobeychik, Yevgeniy.  2015.  Security Games with Protection Externalities. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. :914–920.

Stackelberg security games have been widely deployed in recent years to schedule security resources. An assumption in most existing security game models is that one security resource assigned to a target only protects that target. However, in many important real-world security scenarios, when a resource is assigned to a target, it exhibits protection externalities: that is, it also protects other "neighbouring" targets. We investigate such Security Games with Protection Externalities (SPEs). First, we demonstrate that computing a strong Stackelberg equilibrium for an SPE is NP-hard, in contrast with traditional Stackelberg security games which can be solved in polynomial time. On the positive side, we propose a novel column generation based approach—CLASPE—to solve SPEs. CLASPE features the following novelties: 1) a novel mixed-integer linear programming formulation for the slave problem; 2) an extended greedy approach with a constant-factor approximation ratio to speed up the slave problem; and 3) a linear-scale linear programming that efficiently calculates the upper bounds of target-defined subproblems for pruning. Our experimental evaluation demonstrates that CLASPE enable us to scale to realistic-sized SPE problem instances.

2015-11-12
Laszka, Aron, Vorobeychik, Yevgeniy, Koutsoukos, Xenofon.  2015.  Integrity Assurance in Resource-bounded Systems Through Stochastic Message Authentication. Proceedings of the 2015 Symposium and Bootcamp on the Science of Security. :1:1–1:12.

Assuring communication integrity is a central problem in security. However, overhead costs associated with cryptographic primitives used towards this end introduce significant practical implementation challenges for resource-bounded systems, such as cyber-physical systems. For example, many control systems are built on legacy components which are computationally limited but have strict timing constraints. If integrity protection is a binary decision, it may simply be infeasible to introduce into such systems; without it, however, an adversary can forge malicious messages, which can cause signi cant physical or financial harm. We propose a formal game-theoretic framework for optimal stochastic message authentication, providing provable integrity guarantees for resource-bounded systems based on an existing MAC scheme. We use our framework to investigate attacker deterrence, as well as optimal design of stochastic message authentication schemes when deterrence is impossible. Finally, we provide experimental results on the computational performance of our framework in practice.

Li, Bo, Vorobeychik, Yevgeniy, Li, Muqun, Malin, Bradley.  2015.  Iterative Classification for Sanitizing Large-Scale Datasets. SIAM International Conference on Data Mining.

Cheap ubiquitous computing enables the collectionof massive amounts of personal data in a wide variety of domains.Many organizations aim to share such data while obscuring fea-tures that could disclose identities or other sensitive information.Much of the data now collected exhibits weak structure (e.g.,natural language text) and machine learning approaches havebeen developed to identify and remove sensitive entities in suchdata. Learning-based approaches are never perfect and relyingupon them to sanitize data can leak sensitive information as aconsequence. However, a small amount of risk is permissiblein practice, and, thus, our goal is to balance the value ofdata published and the risk of an adversary discovering leakedsensitive information. We model data sanitization as a gamebetween 1) a publisher who chooses a set of classifiers to applyto data and publishes only instances predicted to be non-sensitiveand 2) an attacker who combines machine learning and manualinspection to uncover leaked sensitive entities (e.g., personal names). We introduce an iterative greedy algorithm for thepublisher that provably executes no more than a linear numberof iterations, and ensures a low utility for a resource-limitedadversary. Moreover, using several real world natural languagecorpora, we illustrate that our greedy algorithm leaves virtuallyno automatically identifiable sensitive instances for a state-of-the-art learning algorithm, while sharing over 93% of the original data, and completes after at most 5 iterations.

Xia, Weiyi, Kantarcioglu, Murat, Wan, Zhiyu, Heatherly, Raymond, Vorobeychik, Yevgeniy, Malin, Bradley.  2015.  Process-Driven Data Privacy. Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. :1021–1030.

The quantity of personal data gathered by service providers via our daily activities continues to grow at a rapid pace. The sharing, and the subsequent analysis of, such data can support a wide range of activities, but concerns around privacy often prompt an organization to transform the data to meet certain protection models (e.g., k-anonymity or E-differential privacy). These models, however, are based on simplistic adversarial frameworks, which can lead to both under- and over-protection. For instance, such models often assume that an adversary attacks a protected record exactly once. We introduce a principled approach to explicitly model the attack process as a series of steps. Specically, we engineer a factored Markov decision process (FMDP) to optimally plan an attack from the adversary's perspective and assess the privacy risk accordingly. The FMDP captures the uncertainty in the adversary's belief (e.g., the number of identied individuals that match the de-identified data) and enables the analysis of various real world deterrence mechanisms beyond a traditional protection model, such as a penalty for committing an attack. We present an algorithm to solve the FMDP and illustrate its efficiency by simulating an attack on publicly accessible U.S. census records against a real identied resource of over 500,000 individuals in a voter registry. Our results demonstrate that while traditional privacy models commonly expect an adversary to attack exactly once per record, an optimal attack in our model may involve exploiting none, one, or more indiviuals in the pool of candidates, depending on context.

Lou, Jian, Vorobeychik, Yevgeniy.  2015.  Equilibrium analysis of multi-defender security games. Proceedings of the 24th International Conference on Artificial Intelligence. :596–602.

Stackelberg game models of security have received much attention, with a number of approaches for
computing Stackelberg equilibria in games with a single defender protecting a collection of targets. In contrast, multi-defender security games have received significantly less attention, particularly when each defender protects more than a single target. We fill this gap by considering a multi-defender security game, with a focus on theoretical characterizations of equilibria and the price of anarchy. We present the analysis of three models of increasing generality, two in which each defender protects multiple targets. In all models, we find that the defenders often have the incentive to over protect the targets, at times significantly. Additionally, in the simpler models, we find that the price of anarchy is unbounded, linearly increasing both in the number of defenders and the number of targets per defender. Surprisingly, when we consider a more general model, this results obtains only in a “corner” case in the space of parameters; in most cases, however, the price of anarchy converges to a constant when the number of defenders increases.

2015-03-03
Li, Bo, Vorobeychik, Yevgeniy.  2014.  Feature Cross-Substitution in Adversarial Classification. Advances in Neural Information Processing Systems 27. :2087–2095.

The success of machine learning, particularly in supervised settings, has led to numerous attempts to apply it in adversarial settings such as spam and malware detection. The core challenge in this class of applications is that adversaries are not static data generators, but make a deliberate effort to evade the classifiers deployed to detect them. We investigate both the problem of modeling the objectives of such adversaries, as well as the algorithmic problem of accounting for rational, objective-driven adversaries. In particular, we demonstrate severe shortcomings of feature reduction in adversarial settings using several natural adversarial objective functions, an observation that is particularly pronounced when the adversary is able to substitute across similar features (for example, replace words with synonyms or replace letters in words). We offer a simple heuristic method for making learning more robust to feature cross-substitution attacks. We then present a more general approach based on mixed-integer linear programming with constraint generation, which implicitly trades off overfitting and feature selection in an adversarial setting using a sparse regularizer along with an evasion model. Our approach is the first method for combining an adversarial classification algorithm with a very general class of models of adversarial classifier evasion. We show that our algorithmic approach significantly outperforms state-of-the-art alternatives.

Smith, Andrew, Vorobeychik, Yevgeniy, Letchford, Joshua.  2014.  Multi-Defender Security Games on Networks. SIGMETRICS Perform. Eval. Rev.. 41:4–7.

Stackelberg security game models and associated computational tools have seen deployment in a number of high- consequence security settings, such as LAX canine patrols and Federal Air Marshal Service. This deployment across essentially independent agencies raises a natural question: what global impact does the resulting strategic interaction among the defenders, each using a similar model, have? We address this question in two ways. First, we demonstrate that the most common solution concept of Strong Stackelberg equilibrium (SSE) can result in significant under-investment in security entirely because SSE presupposes a single defender. Second, we propose a framework based on a different solution concept which incorporates a model of interdependencies among targets, and show that in this framework defenders tend to over-defend, even under significant positive externalities of increased defense.

2014-10-01
Vorobeychik, Yevgeniy, Mayo, Jackson R., Armstrong, Robert C., Ruthruff, Joseph R..  2011.  Noncooperatively Optimized Tolerance: Decentralized Strategic Optimization in Complex Systems. Phys. Rev. Lett.. 107:108702.

We introduce noncooperatively optimized tolerance (NOT), a game theoretic generalization of highly optimized tolerance (HOT), which we illustrate in the forest fire framework. As the number of players increases, NOT retains features of HOT, such as robustness and self-dissimilar landscapes, but also develops features of self-organized criticality. The system retains considerable robustness even as it becomes fractured, due in part to emergent cooperation between players, and at the same time exhibits increasing resilience against changes in the environment, giving rise to intermediate regimes where the system is robust to a particular distribution of adverse events, yet not very fragile to changes.