Biblio
The use of shared mutable state, commonly seen in object-oriented systems, is often problematic due to the potential conflicting interactions between aliases to the same state. We present a substructural type system outfitted with a novel lightweight interference control mechanism, rely-guarantee protocols, that enables controlled aliasing of shared resources. By assigning each alias separate roles, encoded in a novel protocol abstraction in the spirit of rely-guarantee reasoning, our type system ensures that challenging uses of shared state will never interfere in an unsafe fashion. In particular, rely-guarantee protocols ensure that each alias will never observe an unexpected value, or type, when inspecting shared memory regardless of how the changes to that shared state (originating from potentially unknown program contexts) are interleaved at run-time.
Almost every sufficiently complex software system today is configurable. Conditional compilation is a simple variability-implementation mechanism that is widely used in open-source projects and industry. Especially, the C preprocessor (CPP) is very popular in practice, but it is also gaining (again) interest in academia. Although there have been several attempts to understand and improve CPP, there is a lack of understanding of how it is used in open-source and industrial systems and whether different usage patterns have emerged. The background is that much research on configurable systems and product lines concentrates on open-source systems, simply because they are available for study in the first place. This leads to the potentially problematic situation that it is unclear whether the results obtained from these studies are transferable to industrial systems. We aim at lowering this gap by comparing the use of CPP in open-source projects and industry—especially from the embedded-systems domain—based on a substantial set of subject systems and well-known variability metrics, including size, scattering, and tangling metrics. A key result of our empirical study is that, regarding almost all aspects we studied, the analyzed open-source systems and the considered embedded systems from industry are similar regarding most metrics, including systems that have been developed in industry and made open source at some point. So, our study indicates that, regarding CPP as variability-implementation mechanism, insights, methods, and tools developed based on studies of open-source systems are transferable to industrial systems—at least, with respect to the metrics we considered.
Security has become the Achilles’ heel of most modern software systems. Techniques ranging from the manual inspection to automated static and dynamic analyses are commonly employed to identify security vulnerabilities prior to the release of the software. However, these techniques are time consuming and cannot keep up with the complexity of ever-growing software repositories (e.g., Google Play and Apple App Store). In this paper, we aim to improve the status quo and increase the efficiency of static analysis by mining relevant information from vulnerabilities found in the categorized software repositories. The approach relies on the fact that many modern software systems are developed using rich application development frameworks (ADF), allowing us to raise the level of abstraction for detecting vulnerabilities and thereby making it possible to classify the types of vulnerabilities that are encountered in a given category of application. We used open-source software repositories comprising more than 7 million lines of code to demonstrate how our approach can improve the efficiency of static analysis, and in turn, vulnerability detection.
In this report we show how to adapt the notion of “attack surface” to formally evaluate security properties at the architectural level of design and to identify vulnerabilities in architectural designs. Further we explore the application of this metric in the context of architecture-based transformations to improve security by reducing the attack surface. These transformations are described in detail and validated with a simple experiment.
Software services and governing communication protocols are increasingly domain-aware. Domains can have multiple interpretations, such as the principals on whose behalf processes act or the location at which parties reside. Domains impact protocol compliance and access control, two central issues to overall functionality and correctness in distributed systems. This paper proposes a session-typed process framework for domain-aware communication-centric systems based on a CurryHoward interpretation of linear logic, here augmented with nominals from hybrid logic indicating domains. These nominals are explicit in the process expressions and govern domain migration, subject to a parametric accessibility relation familiar from the Kripke semantics for modal logic. Flexible access relationships among domains can be elegantly defined and statically enforced. The framework can also account for scenarios in which domain information is discovered only at runtime. Due to the logical origins of our systems, well-typed processes enjoy session fidelity, global progress, and termination. Moreover, well-typed processes always respect the accessibility relation and satisfy a form of domain parametricity, two properties crucial to show that domain-related properties of concrete programs are satisfied.
The Pi Calculus is a popular formalism for modeling distributed computation. Session Types extend the Pi Calculus with a static, inferable type system. Dependent Types allow for a more precise characterization of the behavior of programs, but in their full generality are not inferable. In this paper, we present LiquidPi an approach that combines the dependent type inferencing of Liquid Types with Honda’s Session Types to give a more precise automatically derived description of the behavior of distributed programs. These types can be used to describe/enforce safety properties of distributed systems. We present a type system parametric over an underlying functional language with Pi Calculus connectives and give an inference algorithm for it by means of efficient external solvers and a set of dependent qualifier templates.
We investigate strong normalization, confluence, and behavioral equality in the realm of session-based concurrency. These interrelated issues underpin advanced correctness analysis in models of structured communications. The starting point for our study is an interpretation of linear logic propositions as session types for communicating processes, proposed in prior work. Strong normalization and confluence are established by developing a theory of logical relations. Defined upon a linear type structure, our logical relations remain remarkably similar to those for functional languages. We also introduce a natural notion of observational equivalence for session-typed processes. Strong normalization and confluence come in handy in the associated coinductive reasoning: as applications, we prove that all proof conversions induced by the logic interpretation actually express observational equivalences, and explain how type isomorphismsresulting from linear logic equivalences are realized by coercions between interface types of session-based concurrent systems.
Throughout the years, several typing disciplines for the π-calculus have been proposed. Arguably, the most widespread of these typing disciplines consists of session types. Session types describe the input/output behavior of processes and traditionally provide strong guarantees about this behavior (i.e., deadlock freedom and fidelity). While these systems exploit a fundamental notion of linearity, the precise connection between linear logic and session types has not been well understood. This paper proposes a type system for the π-calculus that corresponds to a standard sequent calculus presentation of intuitionistic linear logic, interpreting linear propositions as session types and thus providing a purely logical account of all key features and properties of session types. We show the deep correspondence between linear logic and session types by exhibiting a tight operational correspondence between cut elimination steps and process reductions. We also discuss an alternative presentation of linear session types based on classical linear logic, and compare our development with other more traditional session type systems.
Software architects design systems to achieve quality attributes like security, reliability, and performance. Key to achieving these quality attributes are design constraints governing how components of the system are configured, communicate and access resources. Unfortunately, identifying, specifying, communicating and enforcing important design constraints – achieving architectural control – can be difficult, particularly in large software systems. We argue for the development of architectural frameworks, built to leverage language mechanisms that provide for domain-specific syntax, editor services and explicit control over capabilities, that help increase architectural control. In particular, we argue for concise, centralized architectural descriptions which are responsible for specifying constraints and passing a minimal set of capabilities to downstream system components, or explicitly entrusting them to individuals playing defined roles within a team. By integrating these architectural descriptions directly into the language, the type system can help enforce technical constraints and editor services can help enforce social constraints. We sketch our approach in the context of distributed systems.
Systematic exploration is an approach to finding race conditions by deterministically executing every possible interleaving of thread transitions and identifying which ones expose bugs. Current systematic exploration techniques are suitable for testing user-space programs, but are inadequate for testing kernels, where the testing framework’s control over concurrency is more complicated. We present Landslide, a systematic exploration tool for finding races in kernels. Landslide targets Pebbles, the kernel specification that students implement in the undergraduate Operating Systems course at Carnegie Mellon University (15- 410). We discuss the techniques Landslide uses to address the general challenges of kernel-level concurrency, and we evaluate its effectiveness and usability as a debugging aid. We show that our techniques make systematic testing in kernel-space feasible and that Landslide is a useful tool for doing so in the context of 15-410.
Sandboxes impose a security policy, isolating applications and their components from the rest of a system. While many sandboxing techniques exist, state of the art sandboxes generally perform their functions within the system that is being defended. As a result, when the sandbox fails or is bypassed, the security of the surrounding system can no longer be assured. We experiment with the idea of in-nimbo sandboxing, encapsulating untrusted computations away from the system we are trying to protect. The idea is to delegate computations that may be vulnerable or malicious to virtual machine instances in a cloud computing environment.
This may not reduce the possibility of an in-situ sandbox compromise, but it could significantly reduce the consequences should that possibility be realized. To achieve this advantage, there are additional requirements, including: (1) A regulated channel between the local and cloud environments that supports interaction with the encapsulated application, (2) Performance design that acceptably minimizes latencies in excess of the in-situ baseline.
To test the feasibility of the idea, we built an in-nimbo sandbox for Adobe Reader, an application that historically has been subject to significant attacks. We undertook a prototype deployment with PDF users in a large aerospace firm. In addition to thwarting several examples of existing PDF-based malware, we found that the added increment of latency, perhaps surprisingly, does not overly impair the user experience with respect to performance or usability.
Self-adaptive systems have the ability to adapt their behavior to dynamic operation conditions. In reaction to changes in the environment, these systems determine the appropriate corrective actions based in part on information about which action will have the best impact on the system. Existing models used to describe the impact of adaptations are either unable to capture the underlying uncertainty and variability of such dynamic environments, or are not compositional and described at a level of abstraction too low to scale in terms of specification effort required for non-trivial systems. In this paper, we address these shortcomings by describing an approach to the specification of impact models based on architectural system descriptions, which at the same time allows us to represent both variability and uncertainty in the outcome of adaptations, hence improving the selection of the best corrective action. The core of our approach is an impact model language equipped with a formal semantics defined in terms of Discrete Time Markov Chains. To validate our approach, we show how employing our language can improve the accuracy of predictions used for decisionmaking in the Rainbow framework for architecture-based self-adaptation.
In prior research we have developed a Curry-Howard interpretation of linear sequent calculus as session-typed processes. In this paper we uniformly integrate this computational interpretation in a functional language via a linear contextual monad that isolates session-based concurrency. Monadic values are open process expressions and are first class objects in the language, thus providing a logical foundation for higher-order session typed processes. We illustrate how the combined use of the monad and recursive types allows us to cleanly write a rich variety of concurrent programs, including higher-order programs that communicate processes. We show the standard metatheoretic result of type preservation, as well as a global progress theorem, which to the best of our knowledge, is new in the higher-order session typed setting.
Identifying factors behind countries’ weakness to cyber-attacks is an important step towards addressing these weaknesses at the root level. For example, identifying factors why some countries become cyber- crime safe heavens can inform policy actions about how to reduce the attractiveness of these countries to cyber-criminals. Currently, however, identifying these factors is mostly based on expert opinions and speculations.
In this work, we perform an empirical study to statistically test the validity of these opinions and specu- lations. In our analysis, we use Symantec’s World Intelligence Network Environment (WINE) Intrusion Prevention System (IPS) telemetry data which contain attack reports from more than 10 million customer computers worldwide. We use regression analysis to test for the relevance of multiple factors including monetary and computing resources, cyber-security research and institutions, and corruption.
Our analysis confirms some hypotheses and disproves others. We find that many countries in Eastern Europe extensively host attacking computers because of a combination of good computing infrastructure and high corruption rate. We also find that web attacks and fake applications are most prevalent in rich countries because attacks on these countries are more lucrative. Finally, we find that computers in Africa launch the lowest rates of cyber-attacks. This is surprising given the bad cyber reputation of some African countries such as Nigeria. Our research has many policy implications.
Finding architectural flaws in object-oriented code requires a runtime architecture that shows multiple components of the same type that are used in different contexts. Previous work showed that a runtime architecture can be approximated by an abstract object graph that a static analysis extracts from code with Ownership Domain annotations. To find architectural flaws, it is not enough to reason about the presence or absence of communication. Additional work is needed to reason about the content of the communication. The contribution of this paper is a static analysis that extracts a hierarchical object graph with dataflow edges that refer to objects. The extraction analysis combines the aliasing precision provided by Ownership Domains with a domainsensitive value flow analysis. We evaluate the extraction analysis on an open-source Android application and discuss examples of dataflow edges that refer to objects that are in actual domains or to flow objects that are in domains corresponding to unique annotations.
In our experience, exploratory testing has reached a level of maturity that makes it a practical and often the most cost-effective approach to testing. Notably, previous work has demonstrated that exploratory testing is capable of finding bugs even in well-tested systems [4, 17, 24, 23]. However, the number of bugs found gives little indication of the efficiency of a testing approach. To drive testing efficiency, this paper focuses on techniques for measuring and maximizing the coverage achieved by exploratory testing. In particular, this paper describes the design, implementation, and evaluation of Eta, a framework for exploratory testing of multithreaded components of a large-scale cluster management system at Google. For simple tests (with millions to billions of possible executions), Eta achieves complete coverage one to two orders of magnitude faster than random testing. For complex tests, Eta adopts a state space reduction technique to avoid the need to explore over 85% of executions and harnesses parallel processing to explore multiple test executions concurrently, achieving a throughput increase of up to 17.5×.
We investigate a notion of behavioral genericity in the context of session type disciplines. To this end, we develop a logically motivated theory of parametric polymorphism, reminiscent of the Girard-Reynolds polymorphic λ-calculus, but casted in the setting of concurrent processes. In our theory, polymorphism accounts for the exchange of abstract communication protocols and dynamic instantiation of heterogeneous interfaces, as opposed to the exchange of data types and dynamic instantiation of individual message types. Our polymorphic session-typed process language satisfies strong forms of type preservation and global progress, is strongly normalizing, and enjoys a relational parametricity principle. Combined, our results confer strong correctness guarantees for communicating systems. In particular, parametricity is key to derive non-trivial results about internal protocol independence, a concurrent analogous of representation independence, and non-interference properties of modular, distributed systems.
To evolve object-oriented code, one must understand both the code structure in terms of classes, and the runtime structure in terms of abstractions of objects that are being created and relations between those objects. To help with this understanding, static program analysis can extract heap abstractions such as object graphs. But the extracted graphs can become too large if they do not sufficiently abstract objects, or too imprecise if they abstract objects excessively to the point of being similar to a class diagram that shows one box for a class to represent all the instances of that class. One previously proposed solution uses both annotations and abstract interpretation to extract a global, hierarchical, abstract object graph that conveys both abstraction and design intent, but can still be related to the code structure. In this paper, we define metrics that relate nodes and edges in the object graph to elements in the code structure to measure how they differ, and if the differences are indicative of language or design features such as encapsulation, polymorphism and inheritance. We compute the metrics across eight systems totaling over 100 KLOC, and show a statistically significant difference between the code and the object graph. In several cases, the magnitude of this difference is large.
Most accounts of information flow security in pro- gramming languages emphasize non-interference to characterize security: in a secure program, changes to high-security inputs do not alter the values of low-security outputs. The definition of non-interference is incompatible with declassification, which allows some low-security outputs to be influenced by high-security inputs. We propose an alternative account of information flow based on an epistemic logic of computational effects. Rather than view a program as a function from inputs to outputs, we instead embrace the principle that information flow security is concerned with the effects a program has on its execution environment. These effects are modelled using a substructural epistemic logic that tracks the flow of knowledge gained by principals and communication channels during execution. Confidentiality is expressed by proving necessary conditions for a principal to know a sensitive fact at the end of an execution. In the simplest case the necessary condition is falsehood, which means that a principal cannot know a secret as a result of a well-typed execution of a program. In the presence of declassification a necessary condition for disclosure is the existence of a proof of authorization in a formal authorization logic, expressing that sensitive data is disclosed only when explicitly authorized. Rather than taken as the primary result, the classical non-interference property arises in the proof of adequacy of the epistemic theory of disclosure, ensuring that it accurately models program behavior. It is suggested that an epistemic account of information flow security is both more natural and more expressive than classical accounts based only on non-interference.
Complementary problems play a central role in equilibrium finding, physical sim- ulation, and optimization. As a consequence, we are interested in understanding how to solve these problems quickly, and this often involves approximation. In this paper we present a method for approximately solving strictly monotone linear complementarity problems with a Galerkin approximation. We also give bounds for the approximate error, and prove novel bounds on perturbation error. These perturbation bounds suggest that a Galerkin approximation may be much less sen- sitive to noise than the original LCP.
Both SAT and #SAT can represent difficult problems in seemingly dissimilar areas such as planning, verification, and probabilistic inference. Here, we examine an expressive new language, #∃SAT, that generalizes both of these languages. #∃SAT problems require counting the number of satisfiable formulas in a concisely-describable set of existentially quantified, propositional formulas. We characterize the expressiveness and worst-case difficulty of #∃SAT by proving it is complete for the complexity class #P NP [1], and re- lating this class to more familiar complexity classes. We also experiment with three new
general-purpose #∃SAT solvers on a battery of problem distributions including a simple logistics domain. Our experiments show that, despite the formidable worst-case complex-
ity of #P NP [1], many of the instances can be solved efficiently by noticing and exploiting a particular type of frequent structure.
In an organization, the interactions users have with software leave patterns or traces of the parts of the systems accessed. These interactions can be associated with the underlying software architecture. The first step in detecting problems like insider threat is to detect those traces that are anomalous. Here, we propose a method to find anomalous users leveraging these interaction traces, categorized by user roles. We propose a model based approach to cluster user sequences and find outliers. We show that the approach works on a simulation of a large scale system based on and Amazon Web application style.
We propose an interactive approach where analysts reason about the security of a system using an abstraction of its runtime structure, as opposed to looking at the code. They interactively refine a hierarchical object graph, set security properties on abstract objects or edges, query the graph, and investigate the results by studying highlighted objects or edges or tracing to the code. Behind the scenes, an inference analysis and an extraction analysis maintain the soundness of the graph with respect to the code.
Conventional security mechanisms at network, host, and source code levels are no longer sufficient in detecting and responding to increasingly dynamic and sophisticated cyber threats today. Detecting anomalous behavior at the architectural level can help better explain the intent of the threat and strengthen overall system security posture. To that end, we present a framework that mines software component interactions from system execution history and applies a detection algorithm to identify anomalous behavior. The framework uses unsupervised learning at runtime, can perform fast anomaly detection “on the fly”, and can quickly adapt to system load fluctuations and user behavior shifts. Our evaluation of the approach against a real Emergency Deployment System has demonstrated very promising results, showing the framework can effectively detect covert attacks, including insider threats, that may be easily missed by traditional intrusion detection methods.
Modern frameworks are required to be extendable as well as secure. However, these two qualities are often at odds. In this poster we describe an approach that uses a combination of static analysis and run-time management, based on software architecture models, that can improve security while maintaining framework extendability. We implement a prototype of the approach for the Android platform. Static analysis identifies the architecture and communication patterns among the collection of apps on an Android device and which communications might be vulnerable to attack. Run-time mechanisms monitor these potentially vulnerable communication patterns, and adapt the system to either deny them, request explicit approval from the user, or allow them.