Biblio
Code Hunt (https://www.codehunt.com/) from Microsoft Research is a web-based serious gaming platform being popularly used for various programming contests. In this paper, we demonstrate preliminary statistical analysis of a Code Hunt data set that contains the programs written by students (only) worldwide during a contest over 48 hours. There are 259 users, 24 puzzles (organized into 4 sectors), and about 13,000 programs submitted by these users. Our analysis results can help improve the creation of puzzles in a future contest.
Using stolen or weak credentials to bypass authentication is one of the top 10 network threats, as shown in recent studies. Disguising as legitimate users, attackers use stealthy techniques such as rootkits and covert channels to gain persistent access to a target system. However, such attacks are often detected after the system misuse stage, i.e., the attackers have already executed attack payloads such as: i) stealing secrets, ii) tampering with system services, and ii) disrupting the availability of production services.
In this talk, we analyze a real-world credential stealing attack observed at the National Center for Supercomputing Applications. We show the disadvantages of traditional detection techniques such as signature-based and anomaly-based detection for such attacks. Our approach is a complement to existing detection techniques. We investigate the use of Probabilistic Graphical Model, specifically Factor Graphs, to integrate security logs from multiple sources for a more accurate detection. Finally, we propose a security testbed architecture to: i) simulate variants of known attacks that may happen in the future, ii) replay such attack variants in an isolated environment, and iii) collect and share security logs of such replays for the security research community.
Pesented at the Illinois Information Trust Institute Joint Trust and Security and Science of Security Seminar, May 3, 2016.
This talk will explore a scalable data analytics pipeline for real-time attack detection through the use of customized honeypots at the National Center for Supercomputing Applications (NCSA). Attack detection tools are common and are constantly improving, but validating these tools is challenging. You must: (i) identify data (e.g., system-level events) that is essential for detecting attacks, (ii) extract this data from multiple data logs collected by runtime monitors, and (iii) present the data to the attack detection tools. On top of this, such an approach must scale with an ever-increasing amount of data, while allowing integration of new monitors and attack detection tools. All of these require an infrastructure to host and validate the developed tools before deployment into a production environment.
We will present a generalized architecture that aims for a real-time, scalable, and extensible pipeline that can be deployed in diverse infrastructures to validate arbitrary attack detection tools. To motivate our approach, we will show an example deployment of our pipeline based on open-sourced tools. The example deployment uses as its data sources: (i) a customized honeypot environment at NCSA and (ii) a container-based testbed infrastructure for interactive attack replay. Each of these data sources is equipped with network and host-based monitoring tools such as Bro (a network-based intrusion detection system) and OSSEC (a host-based intrusion detection system) to allow for the runtime collection of data on system/user behavior. Finally, we will present an attack detection tool that we developed and that we look to validate through our pipeline. In conclusion, the talk will discuss the challenges of transitioning attack detection from theory to practice and how the proposed data analytics pipeline can help that transition.
Presented at the Illinois Information Trust Institute Joint Trust and Security/Science of Security Seminar, October 6, 2016.
Presented at the NSA SoS Quarterly Lablet Meeting, October 2015.
Presented at the NSA SoS Quarterly Lablet Meeting, January 2015 by Ravi Iyer.
Presented at the Illinois SoS Bi-Weekly Meeting, February 2015 by Phuong Cao.
This paper presents a framework for (1) generating variants of known attacks, (2) replaying attack variants in an isolated environment and, (3) validating detection capabilities of attack detection techniques against the variants. Our framework facilitates reproducible security experiments. We generated 648 variants of three real-world attacks (observed at the National Center for Supercomputing Applications at the University of Illinois). Our experiment showed the value of generating attack variants by quantifying the detection capabilities of three detection methods: a signature-based detection technique, an anomaly-based detection technique, and a probabilistic graphical model-based technique.
In this paper we develop a new framework to analyze stability and stabilizability of Linear Switched Systems (LSS) as well as their gain computations. Our approach is based on a combination of state space operator descritions and the Youda parametrization and provides a unified way to analysis an synthesis of LSS and in fact of Linear Time Varying (LTV) systems, in any lp induced norm sense. By specializing to the l case, we show how Linear Programming (LP) can be used to test stability, stabiliazbility and to synthesize stabilizing controllers that guarantee a near optimal closed-loop gain.
Presented at the NSA Science of Security Quarterly Meeting, July 2016.
This article describes an emerging direction in the intersection between human-computer interaction and cognitive science: the use of cognitive models to give insight into the challenges of cybersecurity. The article gives a brief overview of work in different areas of cybersecurity where cognitive modeling research plays a role, with regard to direct interaction between end users and computer systems and with regard to the needs of security analysts working behind the scenes. The problem of distinguishing between human users and automated agents (bots) interacting with computer systems is introduced, as well as ongoing efforts toward building Human Subtlety Proofs, persistent and unobtrusive windows into human cognition with direct application to cybersecurity. Two computer games are described, proxies to illustrate different ways in which cognitive modeling can potentially contribute to the development of HSPs and similar cybersecurity applications.
Humans can easily find themselves in high cost situations where they must choose between suggestions made by an automated decision aid and a conflicting human decision aid. Previous research indicates that trust is an antecedent to reliance, and often influences how individuals prioritize and integrate information presented from a human and/or automated information source. Expanding on previous work conducted by Lyons and Stokes (2012), the current experiment measured how trust in automated or human decision aids differs along with perceived risk and workload. The simulated task required 126 participants to choose the safest route for a military convoy; they were presented with conflicting information regarding which route was safest from an automated tool and a human. Results demonstrated that as workload increased, trust in automation decreased. As the perceived risk increased, trust in the human decision aid increased. Individual differences in dispositional trust correlated with an increased trust in both decision aids. These findings can be used to inform training programs and systems for operators who may receive information from human and automated sources. Examples of this context include: air traffic control, aviation, and signals intelligence.
A thorough understanding of society’s privacy incidents is of paramount importance for technical solutions, training/education, social research, and legal scholarship in privacy. The goal of the PrIncipedia project is to provide this understanding by developing the first comprehensive database of privacy incidents, enabling the exploration of a variety of privacy-related research questions. We provide a working definition of “privacy incident” and evidence that it meets end-user perceptions of privacy. We also provide semi-automated support for building the database through a learned classifier that detects news articles about privacy incidents.
This article describes an emerging direction in the intersection between human–computer interaction and cognitive science: the use of cognitive models to give insight into the challenges of cybersecurity (cyber-SA). The article gives a brief overview of work in different areas of cyber-SA where cognitive modeling research plays a role, with regard to direct interaction between end users and computer systems and with regard to the needs of security analysts working behind the scenes. The problem of distinguishing between human users and automated agents (bots) interacting with computer systems is introduced, as well as ongoing efforts toward building Human Subtlety Proofs (HSPs), persistent and unobtrusive windows into human cognition with direct application to cyber-SA. Two computer games are described, proxies to illustrate different ways in which cognitive modeling can potentially contribute to the development of HSPs and similar cyber-SA applications.
Online access
AbstractThis article describes an emerging direction in the intersection between human-computer interaction and cognitive science: the use of cognitive models to give insight into the challenges of cybersecurity. The article gives a brief overview of work in different areas of cybersecurity where cognitive modeling research plays a role, with regard to direct interaction between end users and computer systems and with regard to the needs of security analysts working behind the scenes. The problem of distinguishing between human users and automated agents (bots) interacting with computer systems is introduced, as well as ongoing efforts toward building Human Subtlety Proofs, persistent and unobtrusive windows into human cognition with direct application to cybersecurity. Two computer games are described, proxies to illustrate different ways in which cognitive modeling can potentially contribute to the development of HSPs and similar cybersecurity applications.
Security isolation is a foundation of computing systems that enables resilience to different forms of attacks. This article seeks to understand existing security isolation techniques by systematically classifying different approaches and analyzing their properties. We provide a hierarchical classification structure for grouping different security isolation techniques. At the top level, we consider two principal aspects: mechanism and policy. Each aspect is broken down into salient dimensions that describe key properties. We break the mechanism into two dimensions: enforcement location and isolation granularity, and break the policy aspect down into three dimensions: policy generation, policy configurability, and policy lifetime. We apply our classification to a set of representative papers that cover a breadth of security isolation techniques and discuss trade-offs among different design choices and limitations of existing approaches.
Anonymous messaging platforms like Whisper and Yik Yak allow users to spread messages over a network (e.g., a social network) without revealing message authorship to other users. The spread of messages on these platforms can be modeled by a diffusion process over a graph. Recent advances in network analysis have revealed that such diffusion processes are vulnerable to author deanonymization by adversaries with access to metadata, such as timing information. In this work, we ask the fundamental question of how to propagate anonymous messages over a graph to make it difficult for adversaries to infer the source. In particular, we study the performance of a message propagation protocol called adaptive diffusion introduced in (Fanti et al., 2015). We prove that when the adversary has access to metadata at a fraction of corrupted graph nodes, adaptive diffusion achieves asymptotically optimal source-hiding and significantly outperforms standard diffusion. We further demonstrate empirically that adaptive diffusion hides the source effectively on real social networks.
Anonymous messaging applications have recently gained popularity as a means for sharing opinions without fear of judgment or repercussion. These messages propagate anonymously over a network, typically de ned by social connections or physical proximity. However, recent advances in rumor source detection show that the source of such an anonymous message can be inferred by certain statistical inference attacks. Adaptive di usion was recently proposed as a solution that achieves optimal source obfuscation over regular trees. However, in real social networks, the degrees difer from node to node, and adaptive di usion can be signicantly sub-optimal. This gap increases as the degrees become more irregular.
In order to quantify this gap, we model the underlying network as coming from standard branching processes with i.i.d. degree distributions. Building upon the analysis techniques from branching processes, we give an analytical characterization of the dependence of the probability of detection achieved by adaptive di usion on the degree distribution. Further, this analysis provides a key insight: passing a rumor to a friend who has many friends makes the source more ambiguous. This leads to a new family of protocols that we call Preferential Attachment Adaptive Di usion (PAAD). When messages are propagated according to PAAD, we give both the MAP estimator for nding the source and also an analysis of the probability of detection achieved by this adversary. The analytical results are not directly comparable, since the adversary's observed information has a di erent distribution under adaptive di usion than under PAAD. Instead, we present results from numerical experiments that suggest that PAAD achieves a lower probability of detection, at the cost of increased communication for coordination.
Humans can easily find themselves in high cost situations where they must choose between suggestions made by an automated decision aid and a conflicting human decision aid. Previous research indicates that humans often rely on automation or other humans, but not both simultaneously. Expanding on previous work conducted by Lyons and Stokes (2012), the current experiment measures how trust in automated or human decision aids differs along with perceived risk and workload. The simulated task required 126 participants to choose the safest route for a military convoy; they were presented with conflicting information from an automated tool and a human. Results demonstrated that as workload increased, trust in automation decreased. As the perceived risk increased, trust in the human decision aid increased. Individual differences in dispositional trust correlated with an increased trust in both decision aids. These findings can be used to inform training programs for operators who may receive information from human and automated sources. Examples of this context include: air traffic control, aviation, and signals intelligence.
Hadoop has become increasingly popular as it rapidly processes data in parallel. Cloud computing gives reliability, flexibility, scalability, elasticity and cost saving to cloud users. Deploying Hadoop in cloud can benefit Hadoop users. Our evaluation exhibits that various internal cloud attacks can bypass current Hadoop security mechanisms, and compromised Hadoop components can be used to threaten overall Hadoop. It is urgent to improve compromise resilience, Hadoop can maintain a relative high security level when parts of Hadoop are compromised. Hadoop has two vulnerabilities that can dramatically impact its compromise resilience. The vulnerabilities are the overloaded authentication key, and the lack of fine-grained access control at the data access level. We developed a security enhancement for a public cloud-based Hadoop, named SEHadoop, to improve the compromise resilience through enhancing isolation among Hadoop components and enforcing least access privilege for Hadoop processes. We have implemented the SEHadoop model, and demonstrated that SEHadoop fixes the above vulnerabilities with minimal or no run-time overhead, and effectively resists related attacks.
Objective: The overarching goal is to convey the concept of science of security and the contributions that a scientifically based, human factors approach can make to this interdisciplinary field.Background: Rather than a piecemeal approach to solving cybersecurity problems as they arise, the U.S. government is mounting a systematic effort to develop an approach grounded in science. Because humans play a central role in security measures, research on security-related decisions and actions grounded in principles of human information-processing and decision-making is crucial to this interdisciplinary effort.Method: We describe the science of security and the role that human factors can play in it, and use two examples of research in cybersecurity—detection of phishing attacks and selection of mobile applications—to illustrate the contribution of a scientific, human factors approach.Results: In these research areas, we show that systematic information-processing analyses of the decisions that users make and the actions they take provide a basis for integrating the human component of security science.Conclusion: Human factors specialists should utilize their foundation in the science of applied information processing and decision making to contribute to the science of cybersecurity.
The objective of this work is to develop an efficient and practical sensor placement method for the failure detection and localization in water networks. We formulate the problem as the minimum test cover problem (MTC) with the objective of selecting the minimum number of sensors required to uniquely identify and localize pipe failure events. First, we summarize a single-level sensing model and discuss an efficient fast greedy approach for solving the MTC problem. Simulation results on benchmark test networks demonstrate the efficacy of the fast greedy algorithm. Second, we develop a multi-level sensing model that captures additional physical features of the disturbance event, such as the time lapsed between the occurrence of disturbance and its detection by the sensor. Our sensor placement approach using MTC extends to the multi-level sensing model and an improved identification performance is obtained via reduced number of sensors (in comparison to single-level sensing model). In particular, we investigate the bi-level sensing model to illustrate the efficacy of employing multi-level sensors for the identification of failure events. Finally, we suggest extensions of our approach for the deployment of heterogeneous sensors in water networks by exploring the trade-off between cost and performance (measured in terms of the identification score of pipe/link failures).
The diverse views of science of security have opened up several alleys towards applying the methods of science to security. We pursue a different kind of connection between science and security. This paper explores the idea that security is not just a suitable subject for science,. but that the process of security is also similar to the process of science. This similarity arises from the fact that both science and security depend on the methods of inductive inference. Because of this dependency, a scientific theory can never be definitely proved, but can only be disproved by new evidence, and improved into a better theory. Because of the same dependency, every security claim and method has a lifetime, and always eventually needs to be improved.
In this general framework of security-as-science, we explore the ways to apply the methods of scientific induction in the process of trust. The process of trust building and updating is viewed as hypothesis testing. We propose to formulate the trust hypotheses by the methods of algorithmic learning, and to build more robust trust testing and vetting methodologies on the solid foundations of statistical inference.
Intrusion detection systems (IDSs) assume increasingly importance in past decades as information systems become ubiquitous. Despite the abundance of intrusion detection algorithms developed so far, there is still no single detection algorithm or procedure that can catch all possible intrusions; also, simultaneously running all these algorithms may not be feasible for practical IDSs due to resource limitation. For these reasons, effective IDS configuration becomes crucial for real-time intrusion detection. However, the uncertainty in the intruder’s type and the (often unknown) dynamics involved with the target system pose challenges to IDS configuration. Considering these challenges, the IDS configuration problem is formulated as an incomplete information stochastic game in this work, and a new algorithm, Bayesian Nash-Q learning, that combines conventional reinforcement learning with a Bayesian type identification procedure is proposed. Numerical results show that the proposed algorithm can identify the intruder’s type with high fidelity and provide effective configuration.
Knowing inputs that cover a specific branch or statement in a program is useful for debugging and regression testing. Symbolic backward execution (SBE) is a natural approach to find such targeted inputs. However, SBE struggles with complicated arithmetic, external method calls, and data-dependent loops that occur in many real-world programs. We propose symcretic execution, a novel combination of SBE and concrete forward execution that can efficiently find targeted inputs despite these challenges. An evaluation of our approach on a range of test cases shows that symcretic execution finds inputs in more cases than concolic testing tools while exploring fewer path segments. Integration of our approach will allow test generation tools to fill coverage gaps and static bug detectors to verify candidate bugs with concrete test cases. This is the full version of an extended abstract that was presented at the 29th IEEE/ACM International Conference on Automated Software Engineering (ASE 2014), September 15–19, 2014, Västerås, Sweden.
Knowing inputs that cover a specific branch or statement in a program is useful for debugging and regression testing. Symbolic backward execution (SBE) is a natural approach to find such targeted inputs. However, SBE struggles with complicated arithmetic, external method calls, and data- dependent loops that occur in many real-world programs. We propose symcretic execution, a novel combination of SBE and concrete forward execution that can efficiently find targeted inputs despite these challenges. An evaluation of our approach on a range of test cases shows that symcretic execution finds inputs in more cases than concolic testing tools while exploring fewer path segments. Integration of our approach will allow test generation tools to fill coverage gaps and static bug detectors to verify candidate bugs with concrete test cases.