Biblio
Decreasing hardware reliability makes robust firmware imperative for safety-critical applications. Hence, ensuring correct handling of errors in peripherals is a key objective during firmware design. To adequately support robustness considerations of firmware designers during implementation, an efficient qualitative fault injection method is required. This paper presents a high-speed fault injection technique based on host-compiled firmware simulation that is suitable to analyze the impact of transient faults on firmware behavior. Additionally, fault set reduction by static code analysis avoids unnecessary injection of masked and equivalent faults. Application of the proposed fault injection technique on an industrial safety-relevant automotive system-on-chip (SoC) firmware demonstrates at least three orders of magnitude speedup compared to instruction set level. In addition, a fault set reduction by 78% is achieved. While significantly reducing the required fault injection time, the presented techniques provide as accurate feedback to the designer as existing state-of-the-art approaches.
Go is a programming language developed at Google, with channel-based concurrent features based on CSP. Go can detect global communication deadlocks at runtime when all threads of execution are blocked, but deadlocks in other paths of execution could be undetected. We present a new static analyser for concurrent Go code to find potential communication errors such as communication mismatch and deadlocks at compile time. Our tool extracts the communication operations as session types, which are then converted into Communicating Finite State Machines (CFSMs). Finally, we apply a recent theoretical result on choreography synthesis to generate a global graph representing the overall communication pattern of a concurrent program. If the synthesis is successful, then the program is free from communication errors. We have implemented the technique in a tool, and applied it to analyse common Go concurrency patterns and an open source application with over 700 lines of code.
There are many techniques to improve software quality. One is using automatic static analysis tools. We have observed, however, that despite the low-cost help they offer, these tools are underused and often discourage beginners. There is evidence that personality traits influence the perceived usability of a software. Thus, to support beginners better, we need to understand how the workflow of people with different prevalent personality traits using these tools varies. For this purpose, we observed users' solution strategies and correlated them with their prevalent personality traits in an exploratory study with student participants within a controlled experiment. We gathered data by screen capturing and chat protocols as well as a Big Five personality traits test. We found strong correlations between particular personality traits and different strategies of removing the findings of static code analysis as well as between personality and tool utilization. Based on that, we offer take-away improvement suggestions. Our results imply that developers should be aware of these solution strategies and use this information to build tools that are more appealing to people with different prevalent personality traits.
Program defects tend to surface late in the development of programs, and they are hard to detect. Security vulnerabilities are particularly important defects to detect. They may cause sensitive information to be leaked or the system on which the program is executed to be compromised. Existing approaches that use static analysis to detect security vulnerabilities in source code are often limited to a predetermined set of encoded security vulnerabilities. Although these approaches support a decent number of vulnerabilities by default, they cannot be configured for detecting vulnerabilities that are specific to the application domain of the analyzed program. In this paper we present JS-QL, a framework for detecting user-specified security vulnerabilities in JavaScript applications statically. The framework makes use of an internal domain-specific query language hosted by JavaScript. JS-QL queries are based on regular path expressions, enabling users to express queries over a flow graph in a declarative way. The flow graph represents the run-time behavior of a program and is computed by a static analysis. We evaluate JS-QL by expressing 9 security vulnerabilities supported by existing work and comparing the resulting specifications. We conclude that the combination of static analysis and regular path expressions lends itself well to the detection of user-specified security vulnerabilities.
Detecting “similar code” is useful for many software engineering tasks. Current tools can help detect code with statically similar syntactic and–or semantic features (code clones) and with dynamically similar functional input/output (simions). Unfortunately, some code fragments that behave similarly at the finer granularity of their execution traces may be ignored. In this paper, we propose the term “code relatives” to refer to code with similar execution behavior. We define code relatives and then present DyCLINK, our approach to detecting code relatives within and across codebases. DyCLINK records instruction-level traces from sample executions, organizes the traces into instruction-level dynamic dependence graphs, and employs our specialized subgraph matching algorithm to efficiently compare the executions of candidate code relatives. In our experiments, DyCLINK analyzed 422+ million prospective subgraph matches in only 43 minutes. We compared DyCLINK to one static code clone detector from the community and to our implementation of a dynamic simion detector. The results show that DyCLINK effectively detects code relatives with a reasonable analysis time.
Although static analysis tools detect potential code defects early in the development process, they do not fully support developers in resolving those defects. To accurately and efficiently resolve defects, developers must orchestrate several complex tasks, such as determining whether the defect is a false positive and updating the source code without introducing new defects. Without good defect resolution strategies developers may resolve defects erroneously or inefficiently. In this work, I perform a preliminary analysis of the successful and unsuccessful strategies developers use to resolve defects. Based on the successful strategies identified, I then outline a tool to support developers throughout the defect resolution process.
Regression test selection (RTS) aims to reduce regression testing time by only re-running the tests affected by code changes. Prior research on RTS can be broadly split into dy namic and static techniques. A recently developed dynamic RTS technique called Ekstazi is gaining some adoption in practice, and its evaluation shows that selecting tests at a coarser, class-level granularity provides better results than selecting tests at a finer, method-level granularity. As dynamic RTS is gaining adoption, it is timely to also evaluate static RTS techniques, some of which were proposed over three decades ago but not extensively evaluated on modern software projects. This paper presents the first extensive study that evaluates the performance benefits of static RTS techniques and their safety; a technique is safe if it selects to run all tests that may be affected by code changes. We implemented two static RTS techniques, one class-level and one method-level, and compare several variants of these techniques. We also compare these static RTS techniques against Ekstazi, a state-of-the-art, class-level, dynamic RTS technique. The experimental results on 985 revisions of 22 open-source projects show that the class-level static RTS technique is comparable to Ekstazi, with similar performance benefits, but at the risk of being unsafe sometimes. In contrast, the method-level static RTS technique performs rather poorly.
A program subject to a Return-Oriented Programming (ROP) attack usually presents an execution trace with a high frequency of indirect branches. From this observation, several researchers have proposed to monitor the density of these instructions to detect ROP attacks. These techniques use universal thresholds: the density of indirect branches that characterizes an attack is the same for every application. This paper shows that universal thresholds are easy to circumvent. As an alternative, we introduce an inter-procedural semi-context-sensitive static code analysis that estimates the maximum density of indirect branches possible for a program. This analysis determines detection thresholds for each application; thus, making it more difficult for attackers to compromise programs via ROP. We have used an implementation of our technique in LLVM to find specific thresholds for the programs in SPEC CPU2006. By comparing these thresholds against actual execution traces of corresponding programs, we demonstrate the accuracy of our approach. Furthermore, our algorithm is practical: it finds an approximate solution to a theoretically undecidable problem, and handles programs with up to 700 thousand assembly instructions in 25 minutes.
Modern static bug finding tools are complex. They typically consist of hundreds of thousands of lines of code, and most of them are wedded to one language (or even one compiler). This complexity makes the systems hard to understand, hard to debug, and hard to retarget to new languages, thereby dramatically limiting their scope. This paper reduces checking system complexity by addressing a fundamental assumption, the assumption that checkers must depend on a full-blown language specification and compiler front end. Instead, our program checkers are based on drastically incomplete language grammars ("micro-grammars") that describe only portions of a language relevant to a checker. As a result, our implementation is tiny-roughly 2500 lines of code, about two orders of magnitude smaller than a typical system. We hope that this dramatic increase in simplicity will allow people to use more checkers on more systems in more languages. We implement our approach in μchex, a language-agnostic framework for writing static bug checkers. We use it to build micro-grammar based checkers for six languages (C, the C preprocessor, C++, Java, JavaScript, and Dart) and find over 700 errors in real-world projects.
There exist a multitude of execution models available today for a developer to target. The choices vary from general purpose processors to fixed-function hardware accelerators with a large number of variations in-between. There is a growing demand to assess the potential benefits of porting or rewriting an application to a target architecture in order to fully exploit the benefits of performance and/or energy efficiency offered by such targets. However, as a first step of this process, it is necessary to determine whether the application has characteristics suitable for acceleration. In this paper, we present Peruse, a tool to characterize the features of loops in an application and to help the programmer understand the amenability of loops for acceleration. We consider a diverse set of features ranging from loop characteristics (e.g., loop exit points) and operation mixes (e.g., control vs data operations) to wider code region characteristics (e.g., idempotency, vectorizability). Peruse is language, architecture, and input independent and uses the intermediate representation of compilers to do the characterization. Using static analyses makes Peruse scalable and enables analysis of large applications to identify and extract interesting loops suitable for acceleration. We show analysis results for unmodified applications from the SPEC CPU benchmark suite, Polybench, and HPC workloads. For an end-user it is more desirable to get an estimate of the potential speedup due to acceleration. We use the workload characterization results of Peruse as features and develop a machine-learning based model to predict the potential speedup of a loop when off-loaded to a fixed function hardware accelerator. We use the model to predict the speedup of loops selected by Peruse and achieve an accuracy of 79%.
Recommender systems have become quite popular recently. However, such systems are vulnerable to several types of attacks that target user ratings. One such attack is the Sybil attack where an entity masquerades as several identities with the intention of diverting user ratings. In this work, we propose evolutionary game theory as a possible solution to the Sybil attack in recommender systems. After modeling the attack, we use replicator dynamics to solve for evolutionary stable strategies. Our results show that under certain conditions that are easily achievable by a system administrator, the probability of an attack strategy drops to zero implying degraded fitness for Sybil nodes that eventually die out.
Sybil attacks, in which an adversary creates a large number of identities, present a formidable problem for the robustness of recommendation systems. One promising method of sybil detection is to use data from social network ties to implicitly infer trust. Previous work along this dimension typically a) assumes that it is difficult/costly for an adversary to create edges to honest nodes in the network; and b) limits the amount of damage done per such edge, using conductance-based methods. However, these methods fail to detect a simple class of sybil attacks which have been identified in online systems. Indeed, conductance-based methods seem inherently unable to do so, as they are based on the assumption that creating many edges to honest nodes is difficult, which seems to fail in real-world settings. We create a sybil defense system that accounts for the adversary's ability to launch such attacks yet provably withstands them by: Notassuminganyrestrictiononthenumberofedgesanadversarycanform,butinsteadmakingamuch weaker assumption that creating edges from sybils to most honest nodes is difficult, yet allowing that the remaining nodes can be freely connected to. Relaxing the goal from classifying all nodes as honest or sybil to the goal of classifying the "core" nodes of the network as honest; and classifying no sybil nodes as honest. Exploiting a new, for sybil detection, social network property, namely, that nodes can be embedded in low-dimensional spaces.
Crowdsourcing is an unique and practical approach to obtain personalized data and content. Its impact is especially significant in providing commentary, reviews and metadata, on a variety of location based services. In this study, we examine reliability of the Waze mapping service, and its vulnerability to a variety of location-based attacks. Our goals are to understand the severity of the problem, shed light on the general problem of location and device authentication, and explore the efficacy of potential defenses. Our preliminary results already show that a single attacker with limited resources can cause havoc on Waze, producing ``virtual'' congestion and accidents, automatically re-routing user traffic, and compromising user privacy by tracking users' precise movements via software while staying undetected. To defend against these attacks, we propose a proximity-based Sybil detection method to filter out malicious devices.
The production and sale of counterfeit and substandard pharmaceutical products, such as essential medicines, is an important global public health problem. We describe a chemometric passport-based approach to improve the security of the pharmaceutical supply chain. Our method is based on applying nuclear quadrupole resonance (NQR) spectroscopy to authenticate the contents of medicine packets. NQR is a non-invasive, non-destructive, and quantitative radio frequency (RF) spectroscopic technique. It is sensitive to subtle features of the solid-state chemical environment and thus generates unique chemical fingerprints that are intrinsically difficult to replicate. We describe several advanced NQR techniques, including two-dimensional measurements, polarization enhancement, and spin density imaging, that further improve the security of our authentication approach. We also present experimental results that confirm the specificity and sensitivity of NQR and its ability to detect counterfeit medicines.
This year, software development teams around the world are consuming BILLIONS of open source and third-party components. The good news: they are accelerating time to market. The bad news: 1 in 17 components they are using include known security vulnerabilities. In this talk, I will describe what Sonatype, the company behind The Central Repository that supports Apache Maven, has learned from analyzing how thousands of applications use open source components. I will also discuss how organizations like Mayo Clinic, Exxon, Capital One, the U.S. FDA and Intuit are utilizing the principles of software supply chain automation to improve application security and how organizations can balance the need for speed with quality and security early in the development cycle.
We present a process for detection of IP theft in VLSI devices that exploits the internal test scan chains. The IP owner learns implementation details in the suspect device to find evidence of the theft, while the top level function is public. The scan chains supply direct access to the internal registers in the device, thus making it possible to learn the logic functions of the internal combinational logic chunks. Our work introduces an innovative way of applying Boolean function analysis techniques for learning digital circuits with the goal of IP theft detection. By using Boolean function learning methods, the learner creates a partial dependency graph of the internal flip-flops. The graph is further partitioned using the SNN graph clustering method, and individual blocks of combinational logic are isolated. These blocks can be matched with known building blocks that compose the original function. This enables reconstruction of the function implementation to the level of pipeline structure. The IP owner can compare the resulting structure with his own implementation to confirm or refute that an IP violation has occurred. We demonstrate the power of the presented approach with a test case of an open source Bitcoin SHA-256 accelerator, containing more than 80,000 registers. With the presented method we discover the microarchitecture of the module, locate all the main components of the SHA-256 algorithm, and learn the module's flow control.
Given the increasing complexity of modern electronics and the cost of fabrication, entities from around the globe have become more heavily involved in all phases of the electronics supply chain. In this environment, hardware Trojans (i.e., malicious modifications or inclusions made by untrusted third parties) pose major security concerns, especially for those integrated circuits (ICs) and systems used in critical applications and cyber infrastructure. While hardware Trojans have been explored significantly in academia over the last decade, there remains room for improvement. In this article, we examine the research on hardware Trojans from the last decade and attempt to capture the lessons learned. A comprehensive adversarial model taxonomy is introduced and used to examine the current state of the art. Then the past countermeasures and publication trends are categorized based on the adversarial model and topic. Through this analysis, we identify what has been covered and the important problems that are underinvestigated. We also identify the most critical lessons for those new to the field and suggest a roadmap for future hardware Trojan research.
With the advent of globalization in the semiconductor industry, it is necessary to prevent unauthorized usage of third-party IPs (3PIPs), cloning and unwanted modification of 3PIPs, and unauthorized production of ICs. Due to the increasing complexity of ICs, system-on-chip (SoC) designers use various 3PIPs in their design to reduce time-to-market and development costs, which creates a trust issue between the SoC designer and the IP owners. In addition, as the ICs are fabricated around the globe, the SoC designers give fabrication contracts to offshore foundries to manufacture ICs and have little control over the fabrication process, including the total number of chips fabricated. Similarly, the 3PIP owners lack control over the number of fabricated chips and/or the usage of their IPs in an SoC. Existing research only partially addresses the problems of IP piracy and IC overproduction, and to the best of our knowledge, there is no work that considers IP overuse. In this article, we present a comprehensive solution for preventing IP piracy and IC overproduction by assuring forward trust between all entities involved in the SoC design and fabrication process. We propose a novel design flow to prevent IC overproduction and IP overuse. We use an existing logic encryption technique to obfuscate the netlist of an SoC or a 3PIP and propose a modification to enable manufacturing tests before the activation of chips which is absolutely necessary to prevent overproduction. We have used asymmetric and symmetric key encryption, in a fashion similar to Pretty Good Privacy (PGP), to transfer keys from the SoC designer or 3PIP owners to the chips. In addition, we also propose to attach an IP digest (a cryptographic hash of the entire IP) to the header of an IP to prevent modification of the IP by the SoC designers. We have shown that our approach is resistant to various attacks with the cost of minimal area overhead.
There is a growing interest in modeling and predicting the behavior of financial systems and supply chains. In this paper, we focus on the the analysis of the resMBS supply chain; it is associated with the US residential mortgage backed securities and subprime mortgages that were critical in the 2008 US financial crisis. We develop models based on financial institutions (FI), and their participation described by their roles (Role) on financial contracts (FC). Our models are based on an intuitive assumption that FIs will form communities within an FC, and FIs within a community are more likely to collaborate with other FIs in that community, and play the same role, in another FC. Inspired by the Latent Dirichlet Allocation (LDA) and topic models, we develop two probabilistic financial community models. In FI-Comm, each FC (document) is a mix of topics where a topic is a distribution over FIs (words). In Role-FI-Comm, each topic is a distribution over Role-FI pairs (words). Experimental results over 5000+ financial prospecti demonstrate the effectiveness of our models.
Understanding the behavior of complex financial supply chains is usually difficult due to a lack of data capturing the interactions between financial institutions (FIs) and the roles that they play in financial contracts (FCs). resMBS is an example supply chain corresponding to the US residential mortgage backed securities that were critical in the 2008 US financial crisis. In this paper, we describe the process of creating the resMBS graph dataset from financial prospectus. We use the SystemT rule-based text extraction platform to develop two tools, ORG NER and Dict NER, for named entity recognition of financial institution (FI) names. The resMBS graph comprises a set of FC nodes (each prospectus) and the corresponding FI nodes that are extracted from the prospectus. A Role-FI extractor matches a role keyword such as originator, sponsor or servicer, with FI names. We study the performance of the Role-FI extractor, and ORG NER and Dict NER, in constructing the resMBS dataset. We also present preliminary results of a clustering based analysis to identify financial communities and their evolution in the resMBS financial supply chain.
In this presentation, I describe how the SEI's Security Engineering Risk Analysis (SERA) method provides a structure that connects desired system functionality with the underlying software to evaluate the sufficiency of requirements for software security and the potential operational security risks based on mission impact.
A distributed detection method is proposed to detect single stage multi-point (SSMP) attacks on a Cyber Physical System (CPS). Such attacks aim at compromising two or more sensors or actuators at any one stage of a CPS and could totally compromise a controller and prevent it from detecting the attack. However, as demonstrated in this work, using the flow properties of water from one stage to the other, a neighboring controller was found effective in detecting such attacks. The method is based on physical invariants derived for each stage of the CPS from its design. The attack detection effectiveness of the method was evaluated experimentally against an operational water treatment testbed containing 42 sensors and actuators. Results from the experiments point to high effectiveness of the method in detecting a variety of SSMP attacks but also point to its limitations. Distributing the attack detection code among various controllers adds to the scalability of the proposed method.
Many modern cyber-physical systems (CPS), especially industrial automation systems, require the actions of multiple computational systems to be performed at much higher rates and more tightly synchronized than is possible with ad hoc designs. Time is the common entity that computing and physical systems in CPS share, and correct interfacing of that is essential to flawless functionality of a CPS. Fundamental research is needed on ways to synchronize clocks of computing systems to a high degree, and on design methods that enable building blocks of CPS to perform actions at specified times. To realize the potential of CPS in the coming decades, suitable ways to specify distributed CPS applications are needed, including their timing requirements, ways to specify the timing of the CPS components (e.g. sensors, actuators, computing platform), timing analysis to determine if the application design is possible using the components, confident top-down design methodologies that can ensure that the system meets its timing requirements, and ways and methodologies to test and verify that the system meets the timing requirements. Furthermore, strategies for securing timing need to be carefully considered at every CPS design stage and not simply added on. This paper exposes these challenges of CPS development, points out limitations of previous approaches, and provides some research directions towards solving these challenges.
Embedded systems must address a multitude of potentially conflicting design constraints such as resiliency, energy, heat, cost, performance, security, etc., all in the face of highly dynamic operational behaviors and environmental conditions. By incorporating elements of intelligence, the hope is that the resulting “smart” embedded systems will function correctly and within desired constraints in spite of highly dynamic changes in the applications and the environment, as well as in the underlying software/hardware platforms. Since terms related to “smartness” (e.g., self-awareness, self-adaptivity, and autonomy) have been used loosely in many software and hardware computing contexts, we first present a taxonomy of “self-x” terms and use this taxonomy to relate major “smart” software and hardware computing efforts. A major attribute for smart embedded systems is the notion of self-awareness that enables an embedded system to monitor its own state and behavior, as well as the external environment, so as to adapt intelligently. Toward this end, we use a System-on-Chip perspective to show how the CyberPhysical System-on-Chip (CPSoC) exemplar platform achieves self-awareness through a combination of cross-layer sensing, actuation, self-aware adaptations, and online learning. We conclude with some thoughts on open challenges and research directions.