Biblio
Images perturbed subtly to be misclassified by neural networks, called adversarial examples, have emerged as a technically deep challenge and an important concern for several application domains. Most research on adversarial examples takes as its only constraint that the perturbed images are similar to the originals. However, real-world application of these ideas often requires the examples to satisfy additional objectives, which are typically enforced through custom modifications of the perturbation process. In this article, we propose adversarial generative nets (AGNs), a general methodology to train a generator neural network to emit adversarial examples satisfying desired objectives. We demonstrate the ability of AGNs to accommodate a wide range of objectives, including imprecise ones difficult to model, in two application domains. In particular, we demonstrate physical adversarial examples—eyeglass frames designed to fool face recognition—with better robustness, inconspicuousness, and scalability than previous approaches, as well as a new attack to fool a handwritten-digit classifier.
Security experts often recommend using password-management tools that both store passwords and generate random passwords. However, research indicates that only a small fraction of users use password managers with password generators. Past studies have explored factors in the adoption of password managers using surveys and online store reviews. Here we describe a semi-structured interview study with 30 participants that allows us to provide a more comprehensive picture of the mindsets underlying adoption and effective use of password managers and password-generation features. Our participants include users who use no password-specific tools at all, those who use password managers built into browsers or operating systems, and those who use separately installed password managers. Furthermore, past field data has indicated that users of built-in, browser-based password managers more often use weak and reused passwords than users of separate password managers that have password generation available by default. Our interviews suggest that users of built-in password managers may be driven more by convenience, while users of separately installed tools appear more driven by security. We advocate tailored designs for these two mentalities and provide actionable suggestions to induce effective password manager usage.
Advanced persistent threats (APTs) are a particularly troubling challenge for software systems. The adversarial nature of the security domain, and APTs in particular, poses unresolved challenges to the design of self-* systems, such as how to defend against multiple types of attackers with different goals and capabilities. In this interaction, the observability of each side is an important and under-investigated issue in the self-* domain. We propose a model of APT defense that elevates observability as a first-class concern. We evaluate this model by showing how an informed approach that uses observability improves the defender's utility compared to a uniform random strategy, can enable robust planning through sensitivity analysis, and can inform observability-related architectural design decisions.
System administrators are slowly coming to accept that nearly all systems are vulnerable and many should be assumed to be compromised. Rather than preventing all vulnerabilities in complex systems, the approach is changing to protecting systems under the assumption that they are already under attack.
Administrators do not know all the latent vulnerabilities in the systems they are charged with protecting. This work builds on prior approaches that assume more a priori knowledge. [5]. Additionally, prior research does not necessarily guide administrators to gracefully degrade systems in response to threats [4]. Sophisticated attackers with high levels of resources, like advanced persistent threats (APTs), might use zero day exploits against novel vulnerabilities or be slow and stealthy to evade initial lines of detection.
However, defenders often have some knowledge of where attackers are. Additionally, it is possible to reasonably bound attacker resourcing. Exploits have a cost to create [1], and even the most sophisticated attacks use limited number of zero day exploits [3].
However, defenders need a way to reason about and react to the impact of an attacker with existing presence in a system. It may not be possible to maintain one hundred percent of the system's original utility; instead, the attacker might need to gracefully degrade the system, trading off some functional utility to keep an attacker away from the most critical functionality.
We propose a method to "think like an attacker" to evaluate architectures and alternatives in response to knowledge of attacker presence. For each considered alternative architecture, our approach determines the types of exploits an attacker would need to achieve particular attacks using the Datalog declarative logic programming language in a fashion that draws adapts others' prior work [2][4]. With knowledge of how difficult particular exploits are to create, we can approximate the cost to an attacker of a particular attack trace. A bounded search of traces within a limited cost provides a set of hypothetical attacks for a given architecture. These attacks have varying impacts to the system's ability to achieve its functions. Using this knowledge, our approach outputs an architectural alternative that optimally balances keeping an attacker away from critical functionality while preserving that functionality. In the process, it provides evidence in the form of hypothetical attack traces that can be used to explain the reasoning.
This thinking enables a defender to reason about how potential defensive tactics could close off avenues of attack or perhaps enable an ongoing attack. By thinking at the level of architecture, we avoid assumptions of knowledge of specific vulnerabilities. This enables reasoning in a highly uncertain domain.
We applied this to several small systems at varying levels of abstraction. These systems were chosen as exemplars of various "best practices" to see if the approach could quantitatively validate the underpinnings of general rules of thumb like using perimeter security or trading off resilience for security. Ultimately, our approach successfully places architectural components in places that correspond with current best practices and would be reasonable to system architects. In the process of applying the approach at different levels of abstraction, we were able to fine tune our understanding attacker movement through systems in a way that provides security-appropriate architectures despite poor knowledge of latent vulnerabilities; the result of the fine-tuning is a more granular way to understand and evaluate attacker movement in systems.
Future work will explore ways to enhance performance to this approach so it can provide real time planning to gracefully degrade systems as attacker knowledge is discovered. Additionally, we plan to explore ways to enhance expressiveness to the approach to address additional security related concerns; these might include aspects like timing and further levels of uncertainty.
Use of multi-objective probabilistic planning to synthesize behavior of CPSs can play an important role in engineering systems that must self-optimize for multiple quality objectives and operate under uncertainty. However, the reasoning behind automated planning is opaque to end-users. They may not understand why a particular behavior is generated, and therefore not be able to calibrate their confidence in the systems working properly. To address this problem, we propose a method to automatically generate verbal explanation of multi-objective probabilistic planning, that explains why a particular behavior is generated on the basis of the optimization objectives. Our explanation method involves describing objective values of a generated behavior and explaining any tradeoff made to reconcile competing objectives. We contribute: (i) an explainable planning representation that facilitates explanation generation, and (ii) an algorithm for generating contrastive justification as explanation for why a generated behavior is best with respect to the planning objectives. We demonstrate our approach on a mobile robot case study.
Much research has been devoted to better understanding adversarial examples, which are specially crafted inputs to machine-learning models that are perceptually similar to benign inputs, but are classified differently (i.e., misclassified). Both algorithms that create adversarial examples and strategies for defending against adversarial examples typically use Lp-norms to measure the perceptual similarity between an adversarial input and its benign original. Prior work has already shown, however, that two images need not be close to each other as measured by an Lp-norm to be perceptually similar. In this work, we show that nearness according to an Lp-norm is not just unnecessary for perceptual similarity, but is also insufficient. Specifically, focusing on datasets (CIFAR10 and MNIST), Lp-norms, and thresholds used in prior work, we show through online user studies that “adversarial examples” that are closer to their benign counterparts than required by commonly used Lpnorm thresholds can nevertheless be perceptually distinct to humans from the corresponding benign examples. Namely, the perceptual distance between two images that are “near” each other according to an Lp-norm can be high enough that participants frequently classify the two images as representing different objects or digits. Combined with prior work, we thus demonstrate that nearness of inputs as measured by Lp-norms is neither necessary nor sufficient for perceptual similarity, which has implications for both creating and defending against adversarial examples. We propose and discuss alternative similarity metrics to stimulate future research in the area.
Blockchains are one solution for secure distributed interaction, but security vulnerabilities have already been exposed in existing programs. Obsidian, a new blockchain programming language, seeks to prevent some of these vulnerabilities using typestate and linearity. We evaluate the current design of Obsidian by implementing a blockchain application for parametric insurance as a case study. We compare this implementation to one written in Solidity, and find that Obsidian can provide stronger safety guarantees.
Approaches for programming language design used commonly in the research community today center around theoretical and performance-oriented evaluation. Recently, researchers have been considering more approaches to language design, including the use of quantitative and qualitative user studies that examine how different designs might affect programmers. In this paper, we argue for an interdisciplinary approach that incorporates many different methods in the creation and evaluation of programming languages. We argue that the addition of user-oriented design techniques can be helpful at many different stages in the programming language design process.
Adaptive systems are expected to adapt to unanticipated run-time events using imperfect information about themselves, their environment, and goals. This entails handling the effects of uncertainties in decision-making, which are not always considered as a first-class concern. This paper contributes a formal analysis technique that explicitly considers uncertainty in sensing when reasoning about the best way to adapt, together with uncertainty reduction mechanisms to improve system utility. We illustrate our approach on a Denial of Service (DoS) attack scenario and present results that demonstrate the benefits of uncertainty-aware decision-making in comparison to using an uncertainty-ignorant approach, both in the presence and absence of uncertainty reduction mechanisms.
Structure editors allow programmers to edit the tree structure of a program directly. This can have cognitive benefits, particularly for novice and end-user programmers. It also simplifies matters for tool designers, because they do not need to contend with malformed program text. This paper introduces Hazelnut, a structure editor based on a small bidirectionally typed lambda calculus extended with holes and a cursor. Hazelnut goes one step beyond syntactic well-formedness: its edit actions operate over statically meaningful incomplete terms. Naïvely, this would force the programmer to construct terms in a rigid “outside-in” manner. To avoid this problem, the action semantics automatically places terms assigned a type that is inconsistent with the expected type inside a hole. This meaningfully defers the type consistency check until the term inside the hole is finished. Hazelnut is not intended as an end-user tool itself. Instead, it serves as a foundational account of typed structure editing. To that end, we describe how Hazelnut’s rich metatheory, which we have mechanized using the Agda proof assistant, serves as a guide when we extend the calculus to include binary sum types. We also discuss various interpretations of holes, and in so doing reveal connections with gradual typing and contextual modal type theory, the Curry-Howard interpretation of contextual modal logic. Finally, we discuss how Hazelnut’s semantics lends itself to implementation as an event-based functional reactive program. Our simple reference implementation is written using js_of_ocaml.
We typically only consider running programs that are completely written. Programmers end up inserting ad hoc dummy values into their incomplete programs to receive feedback about dynamic behavior. In this work we suggest an evaluation mechanism for incomplete programs, represented as terms with holes. Rather than immediately failing when a hole is encountered, evaluation propagates holes as far as possible. The result is a substantially tighter development loop.
Many applications require not only representing variability in software and data, but also computing with it. To do so efficiently requires variational data structures that make the variability explicit in the underlying data and the operations used to manipulate it. Variational data structures have been developed ad hoc for many applications, but there is little general understanding of how to design them or what tradeoffs exist among them. In this paper, we strive for a more systematic exploration and analysis of a variational data structure. We want to know how different design decisions affect the performance and scalability of a variational data structure, and what properties of the underlying data and operation sequences need to be considered. Specifically, we study several alternative designs of a variational stack, a data structure that supports efficiently representing and computing with multiple variants of a plain stack, and that is a common building block in many algorithms. The different variational stacks are presented as a small product line organized by three design decisions. We analyze how these design decisions affect the performance of a variational stack with different usage profiles. Finally, we evaluate how these design decisions affect the performance of the variational stack in a real-world scenario: in the interpreter VarexJ when executing real software containing variability.
The C preprocessor is used in many C projects to support variability and portability. However, researchers and practitioners criticize the C preprocessor because of its negative effect on code understanding and maintainability and its error proneness. More importantly, the use of the preprocessor hinders the development of tool support that is standard in other languages, such as automated refactoring. Developers aggravate these problems when using the preprocessor in undisciplined ways (e.g., conditional blocks that do not align with the syntactic structure of the code). In this article, we proposed a catalogue of refactorings and we evaluated the number of application possibilities of the refactorings in practice, the opinion of developers about the usefulness of the refactorings, and whether the refactorings preserve behavior. Overall, we found 5670 application possibilities for the refactorings in 63 real-world C projects. In addition, we performed an online survey among 246 developers, and we submitted 28 patches to convert undisciplined directives into disciplined ones. According to our results, 63% of developers prefer to use the refactored (i.e., disciplined) version of the code instead of the original code with undisciplined preprocessor usage. To verify that the refactorings are indeed behavior preserving, we applied them to more than 36 thousand programs generated automatically using a model of a subset of the C language, running the same test cases in the original and refactored programs. Furthermore, we applied the refactorings to three real-world projects: BusyBox, OpenSSL, and SQLite. This way, we detected and fixed a few behavioral changes, 62% caused by unspecified behavior in the C language.
Large software systems have to contend with a significant number of users who interact with different components of the system in various ways. The sequences of components that are used as part of an interaction define sets of behaviors that users have with the system. These can be large in number. Among these users, it is possible that there are some who exhibit anomalous behaviors -- for example, they may have found back doors into the system and are doing something malicious. These anomalous behaviors can be hard to distinguish from normal behavior because of the number of interactions a system may have, or because traces may deviate only slightly from normal behavior. In this paper we describe a model-based approach to cluster sequences of user behaviors within a system and to find suspicious, or anomalous, sequences. We exploit the underlying software architecture of a system to define these sequences. We further show that our approach is better at detecting suspicious activities than other approaches, specifically those that use unigrams and bigrams for anomaly detection. We show this on a simulation of a large scale system based on Amazon Web application style architecture.
Modern mobile platforms rely on a permission model to guard the system's resources and apps. In Android, since the permissions are granted at the granularity of apps, and all components belonging to an app inherit those permissions, an app's components are typically over-privileged, i.e., components are granted more privileges than they need to complete their tasks. Systematic violation of least-privilege principle in Android has shown to be the root cause of many security vulnerabilities. To mitigate this issue, we have developed DELDROID, an automated system for determination of least privilege architecture in Android and its enforcement at runtime. A key contribution of our approach is the ability to limit the privileges granted to apps without the need to modify them. DELDROID utilizes static program analysis techniques to extract the exact privileges each component needs for providing its functionality. A Multiple-Domain Matrix representation of the system's architecture is then used to automatically analyze the security posture of the system and derive its least-privilege architecture. Our experiments on hundreds of real world apps corroborate DELDROID's ability in effectively establishing the least-privilege architecture and its benefits in alleviating the security threats.
Information security can benefit from real-time cyber threat indicator sharing, in which companies and government agencies share their knowledge of emerging cyberattacks to benefit their sector and society at large. As attacks become increasingly sophisticated by exploiting behavioral dimensions of human computer operators, there is an increased risk to systems that store personal information. In addition, risk increases as individuals blur the boundaries between workplace and home computing (e.g., using workplace computers for personal reasons). This paper describes an architecture to leverage individual perceptions of privacy risk to compute privacy risk scores over cyber threat indicator data. Unlike security risk, which is a risk to a particular system, privacy risk concerns an individual's personal information being accessed and exploited. The architecture integrates tools to extract information entities from textual threat reports expressed in the STIX format and privacy risk estimates computed using factorial vignettes to survey individual risk perceptions. The architecture aims to optimize for scalability and adaptability to achieve real-time risk scoring.
Race conditions are difficult to detect because they usually occur only under specific execution interleavings. Numerous program analysis and testing techniques have been proposed to detect race conditions between threads on single applications. However, most of these techniques neglect races that occur at the process level due to complex system event interactions. This article presents a framework, SIMEXPLORER, that allows engineers to effectively test for process-level race conditions. SIMEXPLORER first uses dynamic analysis techniques to observe system execution, identify program locations of interest, and report faults related to oracles. Next, it uses virtualization to achieve the fine-grained controllability needed to exercise event interleavings that are likely to expose races. We evaluated the effectiveness of SIMEXPLORER on 24 real-world applications containing both known and unknown process-level race conditions. Our results show that SIMEXPLORER is effective at detecting these race conditions, while incurring an overhead that is acceptable given its effectiveness improvements.
Many authentication schemes ask users to manually compare compact representations of cryptographic keys, known as fingerprints. If the fingerprints do not match, that may signal a man-in-the-middle attack. An adversary performing an attack may use a fingerprint that is similar to the target fingerprint, but not an exact match, to try to fool inattentive users. Fingerprint representations should thus be both usable and secure. We tested the usability and security of eight fingerprint representations under different configurations. In a 661-participant between-subjects experiment, participants compared fingerprints under realistic conditions and were subjected to a simulated attack. The best configuration allowed attacks to succeed 6% of the time; the worst 72%. We find the seemingly effective compare-and-select approach performs poorly for key fingerprints and that graphical fingerprint representations, while intuitive and fast, vary in performance. We identify some fingerprint representations as particularly promising.
When multiple apps on an Android platform interact, faults and security vulnerabilities can occur. Software engineers need to be able to analyze interacting apps to detect such problems. Current approaches for performing such analyses, however, do not scale to the numbers of apps that may need to be considered, and thus, are impractical for application to realworld scenarios. In this paper, we introduce JITANA, a program analysis framework designed to analyze multiple Android apps simultaneously. By using a classloader-based approach instead of a compiler-based approach such as SOOT, JITANA is able to simultaneously analyze large numbers of interacting apps, perform on-demand analysis of large libraries, and effectively analyze dynamically generated code. Empirical studies of JITANA show that it is substantially more efficient than a state-of-theart approach, and that it can effectively and efficiently analyze complex apps including Facebook, Pokemon Go, and Pandora ´ that the state-of-the-art approach cannot handle.
Non-volatile memory express (NVMe) based SSDs and the NUMA platform are widely adopted in servers to achieve faster storage speed and more powerful processing capability. As of now, very little research has been conducted to investigate the performance and energy efficiency of the stateof-the-art NUMA architecture integrated with NVMe SSDs, an emerging technology used to host parallel I/O threads. As this technology continues to be widely developed and adopted, we need to understand the runtime behaviors of such systems in order to design software runtime systems that deliver optimal performance while consuming only the necessary amount of energy. This paper characterizes the runtime behaviors of a Linuxbased NUMA system employing multiple NVMe SSDs. Our comprehensive performance and energy-efficiency study using massive numbers of parallel I/O threads shows that the penalty due to CPU contention is much smaller than that due to remote access of NVMe SSDs. Based on this insight, we develop a dynamic “lesser evil” algorithm called ESN, to minimize the impact of these two types of penalties. ESN is an energyefficient profiling-based I/O thread scheduler for managing I/O threads accessing NVMe SSDs on NUMA systems. Our empirical evaluation shows that ESN can achieve optimal I/O throughput and latency while consuming up to 50% less energy and using fewer CPUs.
Though immutability has been long-proposed as a way to prevent bugs in software, little is known about how to make immutability support in programming languages effective for software engineers. We designed a new formalism that extends Java to support transitive class immutability, the form of immutability for which there is the strongest empirical support, and implemented that formalism in a tool called Glacier. We applied Glacier successfully to two real-world systems. We also compared Glacier to Java’s final in a user study of twenty participants. We found that even after being given instructions on how to express immutability with final, participants who used final were unable to express immutability correctly, whereas almost all participants who used Glacier succeeded. We also asked participants to make specific changes to immutable classes and found that participants who used final all incorrectly mutated immutable state, whereas almost all of the participants who used Glacier succeeded. Glacier represents a promising approach to enforcing immutability in Java and provides a model for enforcement in other languages.
Certification schemes exist to regulate software systems and prevent them from being deployed before they are judged fit to use. However, practitioners are often unsatisfied with the efficiency of certification standards and processes. In this study, we analyzed two certification standards, Common Criteria and DO-178C, and collected insights from literature and from interviews with subject-matter experts to identify concepts affecting the efficiency of certification processes. Our results show that evaluation time, reusability of evaluation artifacts, and composition of systems and certified artifacts are barriers to achieve efficient certification.
Programming language definitions assign formal meaning to complete programs. Programmers, however, spend a substantial amount of time interacting with incomplete programs -- programs with holes, type inconsistencies and binding inconsistencies -- using tools like program editors and live programming environments (which interleave editing and evaluation). Semanticists have done comparatively little to formally characterize (1) the static and dynamic semantics of incomplete programs; (2) the actions available to programmers as they edit and inspect incomplete programs; and (3) the behavior of editor services that suggest likely edit actions to the programmer based on semantic information extracted from the incomplete program being edited, and from programs that the system has encountered in the past. As such, each tool designer has largely been left to develop their own ad hoc heuristics.
This paper serves as a vision statement for a research program that seeks to develop these "missing" semantic foundations. Our hope is that these contributions, which will take the form of a series of simple formal calculi equipped with a tractable metatheory, will guide the design of a variety of current and future interactive programming tools, much as various lambda calculi have guided modern language designs. Our own research will apply these principles in the design of Hazel, an experimental live lab notebook programming environment designed for data science tasks. We plan to co-design the Hazel language with the editor so that we can explore concepts such as edit-time semantic conflict resolution mechanisms and mechanisms that allow library providers to install library-specific editor services.
The principle of least authority states that each component of the system should be given authority to access only the information and resources that it needs for its operation. This principle is fundamental to the secure design of software systems, as it helps to limit an application’s attack surface and to isolate vulnerabilities and faults. Unfortunately, current programming languages do not provide adequate help in controlling the authority of application modules, an issue that is particularly acute in the case of untrusted third-party extensions. In this paper, we present a language design that facilitates controlling the authority granted to each application module. The key technical novelty of our approach is that modules are firstclass, statically typed capabilities. First-class modules are essentially objects, and so we formalize our module system by translation into an object calculus and prove that the core calculus is typesafe and authority-safe. Unlike prior formalizations, our work defines authority non-transitively, allowing engineers to reason about software designs that use wrappers to provide an attenuated version of a more powerful capability. Our approach allows developers to determine a module’s authority by examining the capabilities passed as module arguments when the module is created, or delegated to the module later during execution. The type system facilitates this by identifying which objects provide capabilities to sensitive resources, and by enabling security architects to examine the capabilities passed into and out of a module based only on the module’s interface, without needing to examine the module’s implementation code. An implementation of the module system and illustrative examples in the Wyvern programming language suggest that our approach can be a practical way to control module authority.
Dynamic adaptation should not leave a software system in an inconsistent state, as it could lead to failure. Prior research has used inter-component dependency models of a system to determine a safe interval for the adaptation of its components, where the most important tradeoff is between disruption in the operations of the system and reachability of safe intervals. This article presents Savasana, which automatically analyzes a software system’s code to extract both inter- and intra-component dependencies. In this way, Savasana is able to obtain more fine-grained models compared to previous approaches. Savasana then uses the detailed models to find safe adaptation intervals that cannot be determined using techniques from prior research. This allows Savasana to achieve a better tradeoff between disruption and reachability. The article demonstrates how Savasana infers safe adaptation intervals for components of a software system under various use cases and conditions.