Biblio
The risk posed by insider threats has usually been approached by analyzing the behavior of users solely in the cyber domain. In this paper, we show the viability of using physical movement logs, collected via a building access control system, together with an understanding of the layout of the building housing the system’s assets, to detect malicious insider behavior that manifests itself in the physical domain. In particular, we propose a systematic framework that uses contextual knowledge about the system and its users, learned from historical data gathered from a building access control system, to select suitable models for representing movement behavior. We then explore the online usage of the learned models, together with knowledge about the layout of the building being monitored, to detect malicious insider behavior. Finally, we show the effectiveness of the developed framework using real-life data traces of user movement in railway transit stations.
In this talk, we investigate applications of Factor Graphs to automatically generate attack signatures from security logs and domain expert knowledge. We demonstrate advantages of Factor Graphs over traditional probabilistic graphical models such as Bayesian Networks and Markov Random Fields in modeling security attacks. We illustrate Factor Graphs models using case studies of real attacks observed in the wild and at the National Center for Supercomputing Applications. Finally, we investigate how factor functions, a core component of Factor Graphs, can be constructed automatically to potentially improve detection accuracy and allow generalization of trained Factor Graph models in a variety of systems.
Presentation for Information Trust Institute Joint Trust and Security/Science of Security Seminar at the University of Illinois at Urbana-Champaign on November 1, 2016.
Presented at the NSA Science of Security Quarterly Meeting, July 2016.
HDFS has been widely used for storing massive scale data which is vulnerable to site disaster. The file system backup is an important strategy for data retention. In this paper, we present an efficient, easy- to-use Backup and Disaster Recovery System for HDFS. The system includes a client based on HDFS with additional feature of remote backup, and a remote server with a HDFS cluster to keep the backup data. It supports full backup and regularly incremental backup to the server with very low cost and high throughout. In our experiment, the average speed of backup and recovery is up to 95 MB/s, approaching the theoretical maximum speed of gigabit Ethernet.
Despite decades of effort to combat spam, unwanted and even malicious emails, such as phish which aim to deceive recipients into disclosing sensitive information, still routinely find their way into one’s mailbox. To be sure, email filters manage to stop a large fraction of spam emails from ever reaching users, but spammers and phishers have mastered the art of filter evasion, or manipulating the content of email messages to avoid being filtered. We present a unique behavioral experiment designed to study email filter evasion. Our experiment is framed in somewhat broader terms: given the widespread use of machine learning methods for distinguishing spam and non-spam, we investigate how human subjects manipulate a spam template to evade a classification-based filter. We find that adding a small amount of noise to a filter significantly reduces the ability of subjects to evade it, observing that noise does not merely have a short-term impact, but also degrades evasion performance in the longer term. Moreover, we find that greater coverage of an email template by the classifier (filter) features significantly increases the difficulty of evading it. This observation suggests that aggressive feature reduction—a common practice in applied machine learning—can actually facilitate evasion. In addition to the descriptive analysis of behavior, we develop a synthetic model of human evasion behavior which closely matches observed behavior and effectively replicates experimental findings in simulation.
Presented at the Science of Security Quarterly Meeting, July 2016.
Presented at the NSA Science of Security Quarterly Meeting, November 2016.
We present a technique for bounded invariant verification of nonlinear networked dynamical systems with delayed interconnections. The underlying problem in precise boundedtime verification lies with computing bounds on the sensitivity of trajectories (or solutions) to changes in initial states and inputs of the system. For large networks, computing this sensitivity
with precision guarantees is challenging. We introduce the notion of input-to-state (IS) discrepancy of each module or subsystem in a larger nonlinear networked dynamical system. The IS discrepancy bounds the distance between two solutions or trajectories of a module in terms of their initial states and their inputs. Given the IS discrepancy functions of the modules, we show that it is possible to effectively construct a reduced (low dimensional) time-delayed dynamical system, such that the trajectory of this reduced model precisely bounds the distance between the trajectories of the complete network with changed initial states. Using the above results we develop a sound and relatively complete algorithm for bounded invariant verification of networked dynamical systems consisting of nonlinear modules interacting through possibly delayed signals. Finally, we introduce a local version of IS discrepancy and show that it is possible to compute them using only the Lipschitz constant and the Jacobian of the dynamic function of the modules.
This paper describes a data driven approach to studying the science of cyber security (SoS). It argues that science is driven by data. It then describes issues and approaches towards the following three aspects: (i) Data Driven Science for Attack Detection and Mitigation, (ii) Foundations for Data Trustworthiness and Policy-based Sharing, and (iii) A Risk-based Approach to Security Metrics. We believe that the three aspects addressed in this paper will form the basis for studying the Science of Cyber Security.
Presented at the NSA Science of Security Quarterly Meeting, July 2016.
The concept of differential privacy stems from the study of private query of datasets. In this work, we apply this concept to discrete-time, linear distributed control systems in which agents need to maintain privacy of certain preferences, while sharing information for better system-level performance. The system has N agents operating in a shared environment that couples their dynamics. We show that for stable systems the performance grows as O(T3/Nε2), where T is the time horizon and ε is the differential privacy parameter. Next, we study lower-bounds in terms of the Shannon entropy of the minimal mean square estimate of the system’s private initial state from noisy communications between an agent and the server. We show that for any of noise-adding differentially private mechanism, then the Shannon entropy is at least nN(1−ln(ε/2)), where n is the dimension of the system, and t he lower bound is achieved by a Laplace-noise-adding mechanism. Finally, we study the problem of keeping the objective functions of individual agents differentially private in the context of cloud-based distributed optimization. The result shows a trade-off between the privacy of objective functions and the performance of the distributed optimization algorithm with noise.
Presented at the Joint Trust and Security/Science of Security Seminar, April 26, 2016.
As our ground transportation infrastructure modernizes, the large amount of data being measured, transmitted, and stored motivates an analysis of the privacy aspect of these emerging cyber-physical technologies. In this paper, we consider privacy in the routing game, where the origins and destinations of drivers are considered private. This is motivated by the fact that this spatiotemporal information can easily be used as the basis for inferences for a person's activities. More specifically, we consider the differential privacy of the mapping from the amount of flow for each origin-destination pair to the traffic flow measurements on each link of a traffic network. We use a stochastic online learning framework for the population dynamics, which is known to converge to the Nash equilibrium of the routing game. We analyze the sensitivity of this process and provide theoretical guarantees on the convergence rates as well as differential privacy values for these models. We confirm these with simulations on a small example.
Best Poster Award, Illinois Institute of Technology Research Day, April 11, 2016.
To help establish a more scientific basis for security science, which will enable the development of fundamental theories and move the field from being primarily reactive to primarily proactive, it is important for research results to be reported in a scientifically rigorous manner. Such reporting will allow for the standard pillars of science, namely replication, meta-analysis, and theory building. In this paper we aim to establish a baseline of the state of scientific work in security through the analysis of indicators of scientific research as reported in the papers from the 2015 IEEE Symposium on Security and Privacy. To conduct this analysis, we developed a series of rubrics to determine the completeness of the papers relative to the type of evaluation used (e.g. case study, experiment, proof). Our findings showed that while papers are generally easy to read, they often do not explicitly document some key information like the research objectives, the process for choosing the cases to include in the studies, and the threats to validity. We hope that this initial analysis will serve as a baseline against which we can measure the advancement of the science of security.
To help establish a more scientific basis for security science, which will enable the development of fundamental theories and move the field from being primarily reactive to primarily proactive, it is important for research results to be reported in a scientifically rigorous manner. Such reporting will allow for the standard pillars of science, namely replication, meta-analysis, and theory building. In this paper we aim to establish a baseline of the state of scientific work in security through the analysis of indicators of scientific research as reported in the papers from the 2015 IEEE Symposium on Security and Privacy. To conduct this analysis, we developed a series of rubrics to determine the completeness of the papers relative to the type of evaluation used (e.g. case study, experiment, proof). Our findings showed that while papers are generally easy to read, they often do not explicitly document some key information like the research objectives, the process for choosing the cases to include in the studies, and the threats to validity. We hope that this initial analysis will serve as a baseline against which we can measure the advancement of the science of security.
Presented at the NSA Science of Security Quarterly Meeting, November 2016.
Presented at NSA SoS Quarterly Meeting, July 2016 and November 2016
Presented at the Illinois Information Trust Institute Assured Cloud Computing Weekly Research Seminar, September 28, 2016.
This paper focuses on a practically very important problem of matching a real-world product photo to exactly the same item(s) in online shopping sites. The task is extremely challenging because the user photos (i.e., the queries in this scenario) are often captured in uncontrolled environments, while the product images in online shops are mostly taken by professionals with clean backgrounds and perfect lighting conditions. To tackle the problem, we study deep network architectures and training schemes, with the goal of learning a robust deep feature representation that is able to bridge the domain gap between the user photos and the online product images. Our contributions are two-fold. First, we propose an alternative of the popular contrastive loss used in siamese deep networks, namely robust contrastive loss, where we "relax" the penalty on positive pairs to alleviate over-fitting. Second, a multi-task fine-tuning approach is introduced to learn a better feature representation, which not only incorporates knowledge from the provided training photo pairs, but also explores additional information from the large ImageNet dataset to regularize the fine-tuning procedure. Experiments on two challenging real-world datasets demonstrate that both the robust contrastive loss and the multi-task fine-tuning approach are effective, leading to very promising results with a time cost suitable for real-time retrieval.
Presented at NSA Science of Security Quarterly Meeting, July 2016.
Presented at the NSA Science of Security Quarterly Meeting, November 2016.
Commercial networks today have diverse security policies, defined by factors such as the type of traffic they carry, nature of applications they support, access control objectives, organizational principles etc. Ideally, the wide diversity in SDN controller frameworks should prove helpful in correctly and efficiently enforcing these policies. However, this has not been the case so far. By requiring the administrators to implement both security as well as performance objectives in the SDN controller, these frameworks have made the task of security policy enforcement in SDNs a challenging one. We observe that by separating security policy enforcement from performance optimization, we can facilitate the use of SDN for flexible policy management. To this end, we propose Oreo, a transparent performance enhancement layer for SDNs. Oreo allows SDN controllers to focus entirely on a correct security policy enforcement, and transparently optimizes the dataplane thus defined, reducing path stretch, switch memory consumption etc. Optimizations are performed while guaranteeing that end-to-end reachability characteristics are preserved – meaning that the security policies defined by the controller are not violated. Oreo performs these optimizations by first constructing a network-wide model describing the behavior of all traffic, and then optimizing the paths observed in the model by solving a multi-objective optimization problem. Initial experiments suggest that the techniques used by Oreo is effective, fast, and can scale to commercial-sized networks.