Biblio
Sandboxes impose a security policy, isolating applications and their components from the rest of a system. While many sandboxing techniques exist, state of the art sandboxes generally perform their functions within the system that is being defended. As a result, when the sandbox fails or is bypassed, the security of the surrounding system can no longer be assured. We experiment with the idea of in-nimbo sandboxing, encapsulating untrusted computations away from the system we are trying to protect. The idea is to delegate computations that may be vulnerable or malicious to virtual machine instances in a cloud computing environment.
This may not reduce the possibility of an in-situ sandbox compromise, but it could significantly reduce the consequences should that possibility be realized. To achieve this advantage, there are additional requirements, including: (1) A regulated channel between the local and cloud environments that supports interaction with the encapsulated application, (2) Performance design that acceptably minimizes latencies in excess of the in-situ baseline.
To test the feasibility of the idea, we built an in-nimbo sandbox for Adobe Reader, an application that historically has been subject to significant attacks. We undertook a prototype deployment with PDF users in a large aerospace firm. In addition to thwarting several examples of existing PDF-based malware, we found that the added increment of latency, perhaps surprisingly, does not overly impair the user experience with respect to performance or usability.
Information system developers and administrators often overlook critical security requirements and best practices. This may be due to lack of tools and techniques that allow practitioners to tailor security knowledge to their particular context. In order to explore the impact of new security methods, we must improve our ability to study the impact of security tools and methods on software and system development. In this paper, we present early findings of an experiment to assess the extent to which the number and type of examples used in security training stimuli can impact security problem solving. To motivate this research, we formulate hypotheses from analogical transfer theory in psychology. The independent variables include number of problem surfaces and schemas, and the dependent variable is the answer accuracy. Our study results do not show a statistically significant difference in performance when the number and types of examples are varied. We discuss the limitations, threats to validity and opportunities for future studies in this area.
In highly configurable systems the configuration space is too big for (re-)certifying every configuration in isolation. In this project, we combine software analysis with network analysis to detect which configuration options interact and which have local effects. Instead of analyzing a system as Linux and SELinux for every combination of configuration settings one by one (>102000 even considering compile-time configurations only), we analyze the effect of each configuration option once for the entire configuration space. The analysis will guide us to designs separating interacting configuration options in a core system and isolating orthogonal and less trusted configuration options from this core.
Security has become the Achilles’ heel of most modern software systems. Techniques ranging from the manual inspection to automated static and dynamic analyses are commonly employed to identify security vulnerabilities prior to the release of the software. However, these techniques are time consuming and cannot keep up with the complexity of ever-growing software repositories (e.g., Google Play and Apple App Store). In this paper, we aim to improve the status quo and increase the efficiency of static analysis by mining relevant information from vulnerabilities found in the categorized software repositories. The approach relies on the fact that many modern software systems are developed using rich application development frameworks (ADF), allowing us to raise the level of abstraction for detecting vulnerabilities and thereby making it possible to classify the types of vulnerabilities that are encountered in a given category of application. We used open-source software repositories comprising more than 7 million lines of code to demonstrate how our approach can improve the efficiency of static analysis, and in turn, vulnerability detection.
In many scientific fields, simulations and analyses require compositions of computational entities such as web-services, programs, and applications. In such fields, users may want various trade-offs between different qualities. Examples include: (i) performing a quick approximation vs. an accurate, but slower, experiment, (ii) using local slower execution environments vs. remote, but advanced, computing facilities, (iii) using quicker approximation algorithms vs. computationally expensive algorithms with smaller data. However, such trade-offs are difficult to make as many such decisions today are either (a) wired into a fixed configuration and cannot be changed, or (b) require detailed systems knowledge and experimentation to determine what configuration to use. In this paper we propose an approach that uses architectural models coupled with automated design space generation for making fidelity and timeliness trade-offs. We illustrate this approach through an example in the intelligence analysis domain.
Project documentation often contains security-relevant statements that are indicative of the security requirements of a system. However these statements may not be explicitly specified or straightforward to locate. At best, requirements analysts manually extract applicable security requirements from project documents. However, security requirements that are not explicitly stated may not be considered during implementation. The goal of this research is to aid requirements analysts in generating security requirements through identifying securityrelevant statements in project documentation and providing context-specific templates to generate security requirements. First, we identify the most prevalent security objectives from software security literature. To identify security-relevant statements in project documentation, we propose a tool-based process to classify statements as related to zero or more security objectives. We then develop a set of context-specific templates to help translate the security objectives of each statement into explicit sets of security functional requirements. We evaluate our process on six documents from the electronic healthcare software industry, identifying 46% of statements as implicitly or explicitly related to security. Our classification approach identified security objectives with a precision of .82 and recall of .79. From our total set of classified statements, we extracted 16 context-specific templates that identify 41 reusable security requirements.
As new market opportunities, technologies, platforms, and frameworks become available, systems require large-scale and systematic architectural restructuring to accommodate them. Today's architects have few techniques to help them plan this architecture evolution. In particular, they have little assistance in planning alternative evolution paths, trading off various aspects of the different paths, or knowing best practices for particular domains. In this paper, we describe an approach for planning and reasoning about architecture evolution. Our approach focuses on providing architects with the means to model prospective evolution paths and supporting analysis to select among these candidate paths. To demonstrate the usefulness of our approach, we show how it can be applied to an actual architecture evolution. In addition, we present some theoretical results about our evolution path constraint specification language.
A self-adaptive software system should be able to monitor and analyze its runtime behavior and make adaptation decisions accordingly to meet certain desirable objectives. Traditional software adaptation techniques and recent “models@runtime” approaches usually require an a priori model for a system’s dynamic behavior. Oftentimes the model is difficult to define and labor-intensive to maintain, and tends to get out of date due to adaptation and architecture decay. We propose an alternative approach that does not require defining the system’s behavior model beforehand, but instead involves mining software component interactions from system execution traces to build a probabilistic usage model, which is in turn used to analyze, plan, and execute adaptations. Our preliminary evaluation of the approach against an Emergency Deployment System shows that the associations mining model can be used to effectively address a variety of adaptation needs, including (1) safely applying dynamic changes to a running software system without creating inconsistencies, (2) identifying potentially malicious (abnormal) behavior for self-protection, and (3) our ongoing research on improving deployment of software components in a distributed setting for performance self-optimization.
Availability is an increasingly important quality for today's software-based systems and it has been successfully addressed by the use of closed-loop control systems in self-adaptive systems. Probes are inserted into a running system to obtain information and the information is fed to a controller that, through provided interfaces, acts on the system to alter its behavior. When a failure is detected, pinpointing the source of the failure is a critical step for a repair action. However, information obtained from a running system is commonly incomplete due to probing costs or unavailability of probes. In this paper we address the problem of fault localization in the presence of incomplete system monitoring. We may not be able to directly observe a component but we may be able to infer its health state. We provide formal criteria to determine when health states of unobservable components can be inferred and establish formal theoretical bounds for accuracy when using any spectrum-based fault localization algorithm.
Concurrent programs are prone to various classes of difficult-to-detect faults, of which data races are particularly prevalent. Prior work has attempted to increase the cost-effectiveness of approaches for testing for data races by employing race detection techniques, but to date, no work has considered cost-effective approaches for re-testing for races as programs evolve. In this paper we present SimRT, an automated regression testing framework for use in detecting races introduced by code modifications. SimRT employs a regression test selection technique, focused on sets of program elements related to race detection, to reduce the number of test cases that must be run on a changed program to detect races that occur due to code modifications, and it employs a test case prioritization technique to improve the rate at which such races are detected. Our empirical study of SimRT reveals that it is more efficient and effective for revealing races than other approaches, and that its constituent test selection and prioritization components each contribute to its performance.
Software architects design systems to achieve quality attributes like security, reliability, and performance. Key to achieving these quality attributes are design constraints governing how components of the system are configured, communicate and access resources. Unfortunately, identifying, specifying, communicating and enforcing important design constraints – achieving architectural control – can be difficult, particularly in large software systems. We argue for the development of architectural frameworks, built to leverage language mechanisms that provide for domain-specific syntax, editor services and explicit control over capabilities, that help increase architectural control. In particular, we argue for concise, centralized architectural descriptions which are responsible for specifying constraints and passing a minimal set of capabilities to downstream system components, or explicitly entrusting them to individuals playing defined roles within a team. By integrating these architectural descriptions directly into the language, the type system can help enforce technical constraints and editor services can help enforce social constraints. We sketch our approach in the context of distributed systems.
Research shows that commonly accepted security requirements are not generally applied in practice. Instead of relying on requirements checklists, security experts rely on their expertise and background knowledge to identify security vulnerabilities. To understand the gap between available checklists and practice, we conducted a series of interviews to encode the decision-making process of security experts and novices during security requirements analysis. Participants were asked to analyze two types of artifacts: source code, and network diagrams for vulnerabilities and to apply a requirements checklist to mitigate some of those vulnerabilities. We framed our study using Situation Awareness—a cognitive theory from psychology—to elicit responses that we later analyzed using coding theory and grounded analysis. We report our preliminary results of analyzing two interviews that reveal possible decision- making patterns that could characterize how analysts perceive, comprehend and project future threats which leads them to decide upon requirements and their specifications, in addition, to how experts use assumptions to overcome ambiguity in specifications. Our goal is to build a model that researchers can use to evaluate their security requirements methods against how experts transition through different situation awareness levels in their decision-making process.
Security requirements patterns represent reusable security practices that software engineers can apply to improve security in their system. Reusing best practices that others have employed could have a number of benefits, such as decreasing the time spent in the requirements elicitation process or improving the quality of the product by reducing product failure risk. Pattern selection can be difficult due to the diversity of applicable patterns from which an analyst has to choose. The challenge is that identifying the most appropriate pattern for a situation can be cumbersome and time-consuming. We propose a new method that combines an inquiry-cycle based approach with the feature diagram notation to review only relevant patterns and quickly select the most appropriate patterns for the situation. Similar to patterns themselves, our approach captures expert knowledge to relate patterns based on decisions made by the pattern user. The resulting pattern hierarchies allow users to be guided through these decisions by questions, which introduce related patterns in order to help the pattern user select the most appropriate patterns for their situation, thus resulting in better requirement generation. We evaluate our approach using access control patterns in a pattern user study.
The use of shared mutable state, commonly seen in object-oriented systems, is often problematic due to the potential conflicting interactions between aliases to the same state. We present a substructural type system outfitted with a novel lightweight interference control mechanism, rely-guarantee protocols, that enables controlled aliasing of shared resources. By assigning each alias separate roles, encoded in a novel protocol abstraction in the spirit of rely-guarantee reasoning, our type system ensures that challenging uses of shared state will never interfere in an unsafe fashion. In particular, rely-guarantee protocols ensure that each alias will never observe an unexpected value, or type, when inspecting shared memory regardless of how the changes to that shared state (originating from potentially unknown program contexts) are interleaved at run-time.
Programming languages often include specialized syntax for common datatypes e.g. lists and some also build in support for specific specialized datatypes e.g. regular expressions, but user-defined types must use general-purpose syntax. Frustration with this causes developers to use strings, rather than structured data, with alarming frequency, leading to correctness, performance, security, and usability issues. Allowing library providers to modularly extend a language with new syntax could help address these issues. Unfortunately, prior mechanisms either limit expressiveness or are not safely composable: individually unambiguous extensions can still cause ambiguities when used together. We introduce type-specific languages TSLs: logic associated with a type that determines how the bodies of generic literals, able to contain arbitrary syntax, are parsed and elaborated, hygienically. The TSL for a type is invoked only when a literal appears where a term of that type is expected, guaranteeing non-interference. We give evidence supporting the applicability of this approach and formally specify it with a bidirectionally typed elaboration semantics for the Wyvern programming language.
To evolve object-oriented code, one must understand both the code structure in terms of classes, and the runtime structure in terms of abstractions of objects that are being created and relations between those objects. To help with this understanding, static program analysis can extract heap abstractions such as object graphs. But the extracted graphs can become too large if they do not sufficiently abstract objects, or too imprecise if they abstract objects excessively to the point of being similar to a class diagram, where one box for a class represents all the instances of that class. One previously proposed solution uses both annotations and abstract interpretation to extract a global, hierarchical, abstract object graph that conveys both abstraction and design intent, but can still be related to the code structure. In this paper, we define metrics that relate nodes and edges in the object graph to elements in the code structure, to measure how they differ, and if the differences are indicative of language or design features such as encapsulation, polymorphism and inheritance. We compute the metrics across eight systems totaling over 100 KLOC, and show a statistically significant difference between the code and the object graph. In several cases, the magnitude of this difference is large.
To evolve object-oriented code, one must understand both the code structure in terms of classes, and the runtime structure in terms of abstractions of objects that are being created and relations between those objects. To help with this understanding, static program analysis can extract heap abstractions such as object graphs. But the extracted graphs can become too large if they do not sufficiently abstract objects, or too imprecise if they abstract objects excessively to the point of being similar to a class diagram that shows one box for a class to represent all the instances of that class. One previously proposed solution uses both annotations and abstract interpretation to extract a global, hierarchical, abstract object graph that conveys both abstraction and design intent, but can still be related to the code structure. In this paper, we define metrics that relate nodes and edges in the object graph to elements in the code structure to measure how they differ, and if the differences are indicative of language or design features such as encapsulation, polymorphism and inheritance. We compute the metrics across eight systems totaling over 100 KLOC, and show a statistically significant difference between the code and the object graph. In several cases, the magnitude of this difference is large.
Static types may be used both by the language implementation and directly by the user as documentation. Though much existing work focuses primarily on the implications of static types on the semantics of programs, relatively little work considers the impact on usability that static types provide. Though the omission of static type information may decrease program length and thereby improve readability, it may also decrease readability because users must then frequently derive type information manually while reading programs. As type inference becomes more popular in languages that are in widespread use, it is important to consider whether the adoption of type inference may impact productivity of developers.
Web applications must ultimately command systems like web browsers and database engines using strings. Strings derived from improperly sanitized user input can as a result be a vector for command injection attacks. In this paper, we introduce regular string types, which classify strings constrained statically to be in a regular language specified by a regular expression. Regular strings support standard string operations like concatenation and substitution, as well as safe coercions, so they can be used to implement, in an essentially conventional manner, the pieces of a web application or framework that handle strings arising from user input. Simple type annotations at function interfaces can be used to statically verify that sanitization has been performed correctly without introducing redundant run-time checks. We specify this type system first as a minimal typed lambda calculus, lambdaRS. To be practical, adopting a specialized type system like this should not require the adoption of a new programming language. Instead, we advocate for extensible type systems: new type system fragments like this should be implemented as libraries atop a mechanism that guarantees that they can be safely composed. We support this with two contributions. First, we specify a translation from lambdaRS to a calculus with only standard strings and regular expressions. Then, taking Python as a language with these constructs, we implement the type system together with the translation as a library using typy, an extensible static type system for Python.
Breaches of software security affect millions of people, and therefore it is crucial to strive for more secure software systems. However, the effect of programming language design on software security is not easily measured or studied. In the absence of scientific insight, opinions range from those that claim that programming language design has no effect on security of the system, to those that believe that programming language design is the only way to provide “high-assurance software.” In this paper, we discuss how programming language design can impact software security by looking at a specific example: the Wyvern programming language. We report on how the design of the Wyvern programming language leverages security principles, together with hypotheses about how usability impacts security, in order to prevent command injection attacks. Furthermore, we discuss what security principles we considered in Wyvern’s design.
We investigate strong normalization, confluence, and behavioral equality in the realm of session-based concurrency. These interrelated issues underpin advanced correctness analysis in models of structured communications. The starting point for our study is an interpretation of linear logic propositions as session types for communicating processes, proposed in prior work. Strong normalization and confluence are established by developing a theory of logical relations. Defined upon a linear type structure, our logical relations remain remarkably similar to those for functional languages. We also introduce a natural notion of observational equivalence for session-typed processes. Strong normalization and confluence come in handy in the associated coinductive reasoning: as applications, we prove that all proof conversions induced by the logic interpretation actually express observational equivalences, and explain how type isomorphismsresulting from linear logic equivalences are realized by coercions between interface types of session-based concurrent systems.
Identifying factors behind countries’ weakness to cyber-attacks is an important step towards addressing these weaknesses at the root level. For example, identifying factors why some countries become cyber- crime safe heavens can inform policy actions about how to reduce the attractiveness of these countries to cyber-criminals. Currently, however, identifying these factors is mostly based on expert opinions and speculations.
In this work, we perform an empirical study to statistically test the validity of these opinions and specu- lations. In our analysis, we use Symantec’s World Intelligence Network Environment (WINE) Intrusion Prevention System (IPS) telemetry data which contain attack reports from more than 10 million customer computers worldwide. We use regression analysis to test for the relevance of multiple factors including monetary and computing resources, cyber-security research and institutions, and corruption.
Our analysis confirms some hypotheses and disproves others. We find that many countries in Eastern Europe extensively host attacking computers because of a combination of good computing infrastructure and high corruption rate. We also find that web attacks and fake applications are most prevalent in rich countries because attacks on these countries are more lucrative. Finally, we find that computers in Africa launch the lowest rates of cyber-attacks. This is surprising given the bad cyber reputation of some African countries such as Nigeria. Our research has many policy implications.
Both SAT and #SAT can represent difficult problems in seemingly dissimilar areas such as planning, verification, and probabilistic inference. Here, we examine an expressive new language, #∃SAT, that generalizes both of these languages. #∃SAT problems require counting the number of satisfiable formulas in a concisely-describable set of existentially quantified, propositional formulas. We characterize the expressiveness and worst-case difficulty of #∃SAT by proving it is complete for the complexity class #P NP [1], and re- lating this class to more familiar complexity classes. We also experiment with three new
general-purpose #∃SAT solvers on a battery of problem distributions including a simple logistics domain. Our experiments show that, despite the formidable worst-case complex-
ity of #P NP [1], many of the instances can be solved efficiently by noticing and exploiting a particular type of frequent structure.