Biblio
We examine the problem of aggregating the results of multiple anti-virus (AV) vendors' detectors into a single authoritative ground-truth label for every binary. To do so, we adapt a well-known generative Bayesian model that postulates the existence of a hidden ground truth upon which the AV labels depend. We use training based on Expectation Maximization for this fully unsupervised technique. We evaluate our method using 279,327 distinct binaries from VirusTotal, each of which appeared for the rst time between January 2012 and June 2014.
Our evaluation shows that our statistical model is consistently more accurate at predicting the future-derived ground truth than all unweighted rules of the form \k out of n" AV detections. In addition, we evaluate the scenario where partial ground truth is available for model building. We train a logistic regression predictor on the partial label information. Our results show that as few as a 100 randomly selected training instances with ground truth are enough to achieve 80% true positive rate for 0.1% false positive rate. In comparison, the best unweighted threshold rule provides only 60% true positive rate at the same false positive rate.
The success of machine learning, particularly in supervised settings, has led to numerous attempts to apply it in adversarial settings such as spam and malware detection. The core challenge in this class of applications is that adversaries are not static data generators, but make a deliberate effort to evade the classifiers deployed to detect them. We investigate both the problem of modeling the objectives of such adversaries, as well as the algorithmic problem of accounting for rational, objective-driven adversaries. In particular, we demonstrate severe shortcomings of feature reduction in adversarial settings using several natural adversarial objective functions, an observation that is particularly pronounced when the adversary is able to substitute across similar features (for example, replace words with synonyms or replace letters in words). We offer a simple heuristic method for making learning more robust to feature cross-substitution attacks. We then present a more general approach based on mixed-integer linear programming with constraint generation, which implicitly trades off overfitting and feature selection in an adversarial setting using a sparse regularizer along with an evasion model. Our approach is the first method for combining an adversarial classification algorithm with a very general class of models of adversarial classifier evasion. We show that our algorithmic approach significantly outperforms state-of-the-art alternatives.
Stackelberg security game models and associated computational tools have seen deployment in a number of high- consequence security settings, such as LAX canine patrols and Federal Air Marshal Service. This deployment across essentially independent agencies raises a natural question: what global impact does the resulting strategic interaction among the defenders, each using a similar model, have? We address this question in two ways. First, we demonstrate that the most common solution concept of Strong Stackelberg equilibrium (SSE) can result in significant under-investment in security entirely because SSE presupposes a single defender. Second, we propose a framework based on a different solution concept which incorporates a model of interdependencies among targets, and show that in this framework defenders tend to over-defend, even under significant positive externalities of increased defense.
We investigate the coverage efficiency of a sensor network consisting of sensors with circular sensing footprints of different radii. The objective is to completely cover a region in an efficient manner through a controlled (or deterministic) deployment of such sensors. In particular, it is shown that when sensing nodes of two different radii are used for complete coverage, the coverage density is increased, and the sensing cost is significantly reduced as compared to the homogeneous case, in which all nodes have the same sensing radius. Configurations of heterogeneous disks of multiple radii to achieve efficient circle coverings are presented and analyzed.
This paper examines security faults/vulnerabilities reported for Fedora. Results indicate that, at least in some situations, fault roughly constant may be used to guide estimation of residual vulnerabilities in an already released product, as well as possibly guide testing of the next version of the product.
Typing is a human activity that can be affected by a number of situational and task-specific factors. Changes in typing behavior resulting from the manipulation of such factors can be predictably observed through key-level input analytics. Here we present a study designed to explore these relationships. Participants play a typing game in which letter composition, word length and number of words appearing together are varied across levels. Inter-keystroke timings and other higher order statistics (such as bursts and pauses), as well as typing strategies, are analyzed from game logs to find the best set of metrics that quantify the effect that different experimental factors have on observable metrics. Beyond task-specific factors, we also study the effects of habituation by recording changes in performance with practice. Currently a work in progress, this research aims at developing a predictive model of human typing. We believe this insight can lead to the development of novel security proofs for interactive systems that can be deployed on existing infrastructure with minimal overhead. Possible applications of such predictive capabilities include anomalous behavior detection, authentication using typing signatures, bot detection using word challenges etc.
As mobile technology begins to dominate computing, understanding how their use impacts security becomes increasingly important. Fortunately, this challenge is also an opportunity: the rich set of sensors with which most mobile devices are equipped provide a rich contextual dataset, one that should enable mobile user behavior to be modeled well enough to predict when users are likely to act insecurely, and provide cognitively grounded explanations of those behaviors. We will evaluate this hypothesis with a series of experiments designed first to confirm that mobile sensor data can reliably predict user stress, and that users experiencing such stress are more likely to act insecurely.
To keep malware out of mobile application markets, existing techniques analyze the security aspects of application behaviors and summarize patterns of these security aspects to determine what applications do. However, user expectations (reflected via user perception in combination with user judgment) are often not incorporated into such analysis to determine whether application behaviors are within user expectations. This poster presents our recent work on bridging the semantic gap between user perceptions of the application behaviors and the actual application behaviors.
One of the biggest challenges in mobile security is human behavior. The most secure password may be useless if it is sent as a text or in an email. The most secure network is only as secure as its most careless user. Thus, in the current project we sought to discover the conditions under which users of mobile devices were most likely to make security errors. This scaffolds a larger project where we will develop automatic ways of detecting such environments and eventually supporting users during these times to encourage safe mobile behaviors.
We propose to build a benchmark with hand-selected test-cases from different equivalence classes, then to directly compare different approaches that make different tradeoffs to better understand which approaches find security vulnerabilities more effectively (better recall, better precision).
Detecting and preventing attacks before they compromise a system can be done using acceptance testing, redundancy based mechanisms, and using external consistency checking such external monitoring and watchdog processes. Diversity-based adjudication, is a step towards an oracle that uses knowable behavior of a healthy system. That approach, under best circumstances, is able to detect even zero-day attacks. In this approach we use functionally equivalent but in some way diverse components and we compare their output vectors and reactions for a given input vector. This paper discusses practical relevance of this approach in the context of recent web-service attacks.
Access Control Policies (ACPs) evolve. Understanding the trends and evolution patterns of ACPs could provide guidance about the reliability and maintenance of ACPs. Our research goal is to help policy authors improve the quality of ACP evolution based on the understanding of trends and evolution patterns in ACPs We performed an empirical study by analyzing the ACP changes over time for two systems: Security Enhanced Linux (SELinux), and an open-source virtual computing platform (VCL). We measured trends in terms of the number of policy lines and lines of code (LOC), respectively. We observed evolution patterns. For example, an evolution pattern st1 → st2 says that st1 (e.g., "read") evolves into st2 (e.g., "read" and "write"). This pattern indicates that policy authors add "write" permission in addition to existing "read" permission. We found that some of evolution patterns appear to occur more frequently.
Techniques commonly used for analyzing streaming video, audio, SIGINT, and network transmissions, at less-than-streaming rates, such as data decimation and ad-hoc sampling, can miss underlying structure, trends and specific events held in the data[3]. This work presents a secure-by-construction approach [7] for the upper-end data streams with rates from 10- to 100 Gigabits per second. The secure-by-construction approach strives to produce system security through the composition of individually secure hardware and software components. The proposed network processor can be used not only at data centers but also within networks and onboard embedded systems at the network periphery for a wide range of tasks, including preprocessing and data cleansing, signal encoding and compression, complex event processing, flow analysis, and other tasks related to collecting and analyzing streaming data. Our design employs a four-layer scalable hardware/software stack that can lead to inherently secure, easily constructed specialized high-speed stream processing. This work addresses the following contemporary problems: (1) There is a lack of hardware/software systems providing stream processing and data stream analysis operating at the target data rates; for high-rate streams the implementation options are limited: all-software solutions can't attain the target rates[1]. GPUs and GPGPUs are also infeasible: they were not designed for I/O at 10-100Gbps; they also have asymmetric resources for input and output and thus cannot be pipelined[4, 2], whereas custom chip-based solutions are costly and inflexible to changes, and FPGA-based solutions are historically hard to program[6]; (2) There is a distinct advantage to utilizing high-bandwidth or line-speed analytics to reduce time-to-discovery of information, particularly ones that can be pipelined together to conduct a series of processing tasks or data tests without impeding data rates; (3) There is potentially significant network infrastructure cost savings possible from compact and power-efficient analytic support deployed at the network periphery on the data source or one hop away; (4) There is a need for agile deployment in response to changing objectives; (5) There is an opportunity to constrain designs to use only secure components to achieve their specific objectives. We address these five problems in our stream processor design to provide secure, easily specified processing for low-latency, low-power 10-100Gbps in-line processing on top of a commodity high-end FPGA-based hardware accelerator network processor. With a standard interface a user can snap together various filter blocks, like Legos™, to form a custom processing chain. The overall design is a four-layer solution in which the structurally lowest layer provides the vast computational power to process line-speed streaming packets, and the uppermost layer provides the agility to easily shape the system to the properties of a given application. Current work has focused on design of the two lowest layers, highlighted in the design detail in Figure 1. The two layers shown in Figure 1 are the embeddable portion of the design; these layers, operating at up to 100Gbps, capture both the low- and high frequency components of a signal or stream, analyze them directly, and pass the lower frequency components, residues to the all-software upper layers, Layers 3 and 4; they also optionally supply the data-reduced output up to Layers 3 and 4 for additional processing. Layer 1 is analogous to a systolic array of processors on which simple low-level functions or actions are chained in series[5]. Examples of tasks accomplished at the lowest layer are: (a) check to see if Field 3 of the packet is greater than 5, or (b) count the number of X.75 packets, or (c) select individual fields from data packets. Layer 1 provides the lowest latency, highest throughput processing, analysis and data reduction, formulating raw facts from the stream; Layer 2, also accelerated in hardware and running at full network line rate, combines selected facts from Layer 1, forming a first level of information kernels. Layer 2 is comprised of a number of combiners intended to integrate facts extracted from Layer 1 for presentation to Layer 3. Still resident in FPGA hardware and hardware-accelerated, a Layer 2 combiner is comprised of state logic and soft-core microprocessors. Layer 3 runs in software on a host machine, and is essentially the bridge to the embeddable hardware; this layer exposes an API for the consumption of information kernels to create events and manage state. The generated events and state are also made available to an additional software Layer 4, supplying an interface to traditional software-based systems. As shown in the design detail, network data transitions systolically through Layer 1, through a series of light-weight processing filters that extract and/or modify packet contents. All filters have a similar interface: streams enter from the left, exit the right, and relevant facts are passed upward to Layer 2. The output of the end of the chain in Layer 1 shown in the Figure 1 can be (a) left unconnected (for purely monitoring activities), (b) redirected into the network (for bent pipe operations), or (c) passed to another identical processor, for extended processing on a given stream (scalability).
Hadoop is a map-reduce implementation that rapidly processes data in parallel. Cloud provides reliability, flexibility, scalability, elasticity and cost saving to customers. Moving Hadoop into Cloud can be beneficial to Hadoop users. However, Hadoop has two vulnerabilities that can dramatically impact its security in a Cloud. The vulnerabilities are its overloaded authentication key, and the lack of fine-grained access control at the data access level. We propose and develop a security enhancement for Cloud-based Hadoop.
It is widely accepted that wireless channels decorrelate fast over space, and half a wavelength is the key distance metric used in link signature (LS) for security assurance. However, we believe that this channel correlation model is questionable, and will lead to false sense of security. In this project, we focus on establishing correct modeling of channel correlation so as to facilitate proper guard zone designs for LS security in various wireless environments of interest.
We present an architecture for the Security Behavior Observatory (SBO), a client-server infrastructure designed to collect a wide array of data on user and computer behavior from hundreds of participants over several years. The SBO infrastructure had to be carefully designed to fulfill several requirements. First, the SBO must scale with the desired length, breadth, and depth of data collection. Second, we must take extraordinary care to ensure the security of the collected data, which will inevitably include intimate participant behavioral data. Third, the SBO must serve our research interests, which will inevitably change as collected data is analyzed and interpreted. This short paper summarizes some of our design and implementation benefits and discusses a few hurdles and trade-offs to consider when designing such a data collection system.
In highly configurable systems the configuration space is too big for (re-)certifying every configuration in isolation. In this project, we combine software analysis with network analysis to detect which configuration options interact and which have local effects. Instead of analyzing a system as Linux and SELinux for every combination of configuration settings one by one (>102000 even considering compile-time configurations only), we analyze the effect of each configuration option once for the entire configuration space. The analysis will guide us to designs separating interacting configuration options in a core system and isolating orthogonal and less trusted configuration options from this core.
This paper presents a model for generating personalized passwords (i.e., passwords based on user and service profile). A user's password is generated from a list of personalized words, each word is drawn from a topic relating to a user and the service in use. The proposed model can be applied to: (i) assess the strength of a password (i.e., determine how many guesses are used to crack the password), and (ii) generate secure (i.e., contains digits, special characters, or capitalized characters) yet easy to memorize passwords.
This paper presents a system named SPOT to achieve high accuracy and preemptive detection of attacks. We use security logs of real-incidents that occurred over a six-year period at National Center for Supercomputing Applications (NCSA) to evaluate SPOT. Our data consists of attacks that led directly to the target system being compromised, i.e., not detected in advance, either by the security analysts or by intrusion detection systems. Our approach can detect 75 percent of attacks as early as minutes to tens of hours before attack payloads are executed.
With the wide popularity of Cloud Computing, Service-oriented Computing is becoming the de-facto approach for the development of distributed systems. This has introduced the issue of trustworthiness with respect to the services being provided. Service Requesters are provided with a wide range of services that they can select from. Usually the service requester compare between these services according to their cost and quality. One essential part of the quality of a service is the trustworthiness properties of such services. Traditional service models focuses on service functionalities and cost when defining services. This paper introduces a new service model that extends traditional service models to support trustworthiness properties.
In this study, we present a control theoretic technique to model routing in wireless multihop networks. We model ad hoc wireless networks as stochastic dynamical systems where, as a base case, a centralized controller pre-computes optimal paths to the destination. The usefulness of this approach lies in the fact that it can help obtain bounds on reliability of end-to-end packet transmissions. We compare this approach with the reliability achieved by some of the widely used routing techniques in multihop networks.
Injection vulnerabilities have topped rankings of the most critical web application vulnerabilities for several years [1, 2]. They can occur anywhere where user input may be erroneously executed as code. The injected input is typically aimed at gaining unauthorized access to the system or to private information within it, corrupting the system's data, or disturbing system availability. Injection vulnerabilities are tedious and difficult to prevent.
Trust is a necessary component in cybersecurity. It is a common task for a system to make a decision about whether or not to trust the credential of an entity from another domain, issued by a third party. Generally, in the cyberspace, connected and interacting systems largely rely on each other with respect to security, privacy, and performance. In their interactions, one entity or system needs to trust others, and this "trust" frequently becomes a vulnerability of that system. Aiming at mitigating the vulnerability, we are developing a computational theory of trust, as a part of our efforts towards Science of Security. Previously, we developed a formal-semantics-based calculus of trust [3, 2], in which trust can be calculated based on a trustor's direct observation on the performance of the trustee, or based on a trust network. In this paper, we construct a framework for making trust reasoning based on the observed evidence. We take privacy in cloud computing as a driving application case [5].
This paper is a proposal for a poster. In it we describe a medical device security approach that researchers at Fraunhofer used to analyze different kinds of medical devices for security vulnerabilities. These medical devices were provided to Fraunhofer by a medical device manufacturer whose name we cannot disclose due to non-disclosure agreements.
The InViz tool is a functional prototype that provides graphical visualizations of log file events to support real-time attack investigation. Through visualization, both experts and novices in cybersecurity can analyze patterns of application behavior and investigate potential cybersecurity attacks. The goal of this research is to identify and evaluate the cybersecurity information to visualize that reduces the amount of time required to perform cyber forensics.