Visible to the public Biblio

Filters: Keyword is Apr'15  [Clear All Filters]
2016-12-06
Limin Jia, Shayak Sen, Deepak Garg, Anupam Datta.  2015.  System M: A Program Logic for Code Sandboxing and Identification.

Security-sensitive applications that execute untrusted code often check the code’s integrity by comparing its syntax to a known good value or sandbox the code to contain its effects. System M is a new program logic for reasoning about such security-sensitive applications. System M extends Hoare Type Theory (HTT) to trace safety properties and, additionally, contains two new reasoning principles. First, its type system internalizes logical equality, facilitating reasoning about applications that check code integrity. Second, a con- finement rule assigns an effect type to a computation based solely on knowledge of the computation’s sandbox. We prove the soundness of System M relative to a step-indexed trace-based semantic model. We illustrate both new reasoning principles of System M by verifying the main integrity property of the design of Memoir, a previously proposed trusted computing system for ensuring state continuity of isolated security-sensitive applications. 

Ju-Sung Lee, Jurgen Pfeffer.  2015.  Robustness of Network Metrics in the Context of Digital Communication Data. HICSS '15 Proceedings of the 2015 48th Hawaii International Conference on System Sciences.

Social media data and other web-based network data are large and dynamic rendering the identification of structural changes in such systems a hard problem. Typically, online data is constantly streaming and results in data that is incomplete thus necessitating the need to understand the robustness of network metrics on partial or sampled network data. In this paper, we examine the effects of sampling on key network centrality metrics using two empirical communication datasets. Correlations between network metrics of original and sampled nodes offer a measure of sampling accuracy. The relationship between sampling and accuracy is convergent and amenable to nonlinear analysis. Naturally, larger edge samples induce sampled graphs that are more representative of the original graph. However, this effect is attenuated when larger sets of nodes are recovered in the samples. Also, we find that the graph structure plays a prominent role in sampling accuracy. Centralized graphs, in which fewer nodes enjoy higher centrality scores, offer more representative samples.

Ju-Sung Lee, Jurgen Pfeffer.  2015.  Estimating Centrality Statistics for Large Scale and Sampled Networks: Some Approaches and Complications. 2015 48th Hawaii International Conference on System Sciences.

The study of large, “big data” networks is becoming increasingly common and relevant to our understanding of human systems. Many of the studied networks are drawn from social media and other web-based sources. As such, in-depth analysis of these dynamic structures e.g. in the context of cybersecurity, remains especially challenging. Due to the time and resources incurred in computing network measures for large networks, it is practical to approximate these whenever possible. We present some approximation techniques exploiting any tractable relationship between the measures and network characteristics such as size and density. We find there exist distinct functional relationships between network statistics of complex “slow” measures and “fast” measures, such as the linkage between betweenness centrality and network density. We also track how these relationships scale with network size. Specifically, we explore the effi- cacy of both linear modeling (i.e., correlations and least squares regression) and non-linear modeling in estimating the network measures of interest. We find that sparse, but not severely sparse, networks which admit sufficient entropy incur the most variance in the network statistics and, hence, more error in the estimation. We review our approaches with three prominent network topologies: random (aka Erdos-R ˝ enyi), Watts- ´ Strogatz small-world, and scale-free networks. Finally, we assess how well the estimation approaches perform for sub-sampled networks.

2016-12-05
Marwan Abi-Antoun, Yibin Wang, Ebrahim Khalaj, Andrew Giang, Vaclav Rajlich.  2015.  Impact Analysis based on a Global Hierarchical Object Graph. 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

During impact analysis on object-oriented code, statically extracting dependencies is often complicated by subclassing, programming to interfaces, aliasing, and collections, among others. When a tool recommends a large number of types or does not rank its recommendations, it may lead developers to explore more irrelevant code. We propose to mine and rank dependencies based on a global, hierarchical points-to graph that is extracted using abstract interpretation. A previous whole-program static analysis interprets a program enriched with annotations that express hierarchy, and over-approximates all the objects that may be created at runtime and how they may communicate. In this paper, an analysis mines the hierarchy and the edges in the graph to extract and rank dependencies such as the most important classes related to a class, or the most important classes behind an interface. An evaluation using two case studies on two systems totaling 10,000 lines of code and five completed code modification tasks shows that following dependencies based on abstract interpretation achieves higher effectiveness compared to following dependencies extracted from the abstract syntax tree. As a result, developers explore less irrelevant code.

Hanan Hibshi, Travis Breaux, Maria Riaz, Laurie Williams.  2015.  Discovering Decision-Making Patterns for Security Novices and Experts.

Security analysis requires some degree of knowledge to align threats to vulnerabilities in information technology. Despite the abundance of security requirements, the evidence suggests that security experts are not applying these checklists. Instead, they default to their background knowledge to identify security vulnerabilities. To better understand the different effects of security checklists, analysis and expertise, we conducted a series of interviews to capture and encode the decisionmaking process of security experts and novices during three security requirements analysis exercises. Participants were asked to analyze three kinds of artifacts: source code, data flow diagrams, and network diagrams, for vulnerabilities, and then to apply a requirements checklist to demonstrate their ability to mitigate vulnerabilities. We framed our study using Situation Awareness theory to elicit responses that were analyzed using coding theory and grounded analysis. Our results include decision-making patterns that characterize how analysts perceive, comprehend and project future threats, and how these patterns relate to selecting security mitigations. Based on this analysis, we discovered new theory to measure how security experts and novices apply attack models and how structured and unstructured analysis enables increasing security requirements coverage. We discuss suggestions of how our method could be adapted and applied to improve training and education instruments of security analysts.

Claus Hunsen, Bo Zhang, Janet Siegmund, Christian Kästner, Olaf Lebenich, Martin Becker, Sven Apel.  2015.  Preprocessor-based variability in open-source and industrial software systems: An empirical study. Empirical Software Engineering. 20:1-34.

Almost every sufficiently complex software system today is configurable. Conditional compilation is a simple variability-implementation mechanism that is widely used in open-source projects and industry. Especially, the C preprocessor (CPP) is very popular in practice, but it is also gaining (again) interest in academia. Although there have been several attempts to understand and improve CPP, there is a lack of understanding of how it is used in open-source and industrial systems and whether different usage patterns have emerged. The background is that much research on configurable systems and product lines concentrates on open-source systems, simply because they are available for study in the first place. This leads to the potentially problematic situation that it is unclear whether the results obtained from these studies are transferable to industrial systems. We aim at lowering this gap by comparing the use of CPP in open-source projects and industry—especially from the embedded-systems domain—based on a substantial set of subject systems and well-known variability metrics, including size, scattering, and tangling metrics. A key result of our empirical study is that, regarding almost all aspects we studied, the analyzed open-source systems and the considered embedded systems from industry are similar regarding most metrics, including systems that have been developed in industry and made open source at some point. So, our study indicates that, regarding CPP as variability-implementation mechanism, insights, methods, and tools developed based on studies of open-source systems are transferable to industrial systems—at least, with respect to the metrics we considered.

2016-02-15
Sarah Nadi, Thorsten Berger, Christian Kästner, Krzysztof Czarnecki.  2015.  Where Do Configuration Constraints Stem From? An Extraction Approach and an Empirical Study IEEE Transactions on Software Engineering. 41(8)

Highly configurable systems allow users to tailor software to specific needs. Valid combinations of configuration options are often restricted by intricate constraints. Describing options and constraints in a variability model allows reasoning about the supported configurations. To automate creating and verifying such models, we need to identify the origin of such constraints. We propose a static analysis approach, based on two rules, to extract configuration constraints from code. We apply it on four highly configurable systems to evaluate the accuracy of our approach and to determine which constraints are recoverable from the code. We find that our approach is highly accurate (93% and 77% respectively) and that we can recover 28% of existing constraints. We complement our approach with a qualitative study to identify constraint sources, triangulating results from our automatic extraction, manual inspections, and interviews with 27 developers. We find that, apart from low-level implementation dependencies, configuration constraints enforce correct runtime behavior, improve users' configuration experience, and prevent corner cases. While the majority of constraints is extractable from code, our results indicate that creating a complete model requires further substantial domain knowledge and testing. Our results aim at supporting researchers and practitioners working on variability model engineering, evolution, and verification techniques.

Ghita Mezzour, Kathleen Carley, L. Richard Carley.  2015.  An empirical study of global malware encounters. HotSoS '15 Proceedings of the 2015 Symposium and Bootcamp on the Science of Security.

The number of trojans, worms, and viruses that computers encounter varies greatly across countries. Empirically identifying factors behind such variation can provide a scientific empirical basis to policy actions to reduce malware encounters in the most affected countries. However, our understanding of these factors is currently mainly based on expert opinions, not empirical evidence.

In this paper, we empirically test alternative hypotheses about factors behind international variation in the number of trojan, worm, and virus encounters. We use the Symantec Anti-Virus (AV) telemetry data collected from more than 10 million Symantec customer computers worldwide that we accessed through the Symantec Worldwide Intelligence Environment (WINE) platform. We use regression analysis to test for the effect of computing and monetary resources, web browsing behavior, computer piracy, cyber security expertise, and international relations on international variation in malware encounters.

We find that trojans, worms, and viruses are most prevalent in Sub-Saharan African countries. Many Asian countries also encounter substantial quantities of malware. Our regression analysis reveals that the main factor that explains high malware exposure of these countries is a widespread computer piracy especially when combined with poverty. Our regression analysis also reveals that, surprisingly, web browsing behavior, cyber security expertise, and international relations have no significant effect.

Ghita Mezzour.  2015.  Assessing the Global Cyber and Biological Threat. Electrical and Computer Engineering Department and Institute for Software Research. Doctor of Philosophy

In today’s inter-connected world, threats from anywhere in the world can have serious global repercussions. In particular, two types of threats have a global impact: 1) cyber crime and 2) cyber and biological weapons. If a country’s environment is conducive to cyber criminal activities, cyber criminals will use that country as a basis to attack end-users around the world. Cyber weapons and biological weapons can now allow a small actor to inflict major damage on a major military power. If cyber and biological weapons are used in combination, the damage can be amplified significantly. Given that the cyber and biological threat is global, it is important to identify countries that pose the greatest threat and design action plans to reduce the threat from these countries. However, prior work on cyber crime lacks empirical substantiation for reasons why some countries’ environments are conducive to cyber crime. Prior work on cyber and biological weapon capabilities mainly consists of case studies which only focus on select countries and thus are not generalizeable. To sum up, assessing the global cyber and biological threat currently lacks a systematic empirical approach. In this thesis, I take an empirical and systematic approach towards assessing the global cyber and biological threat. The first part of the thesis focuses on cyber crime. I examine international variation in cyber crime infrastructure hosting and cyber crime exposure. I also empirically test hypotheses about factors behind such variation. In that work, I use Symantec’s telemetry data, collected from 10 million Symantec customer computers worldwide and accessed through the Symantec’s Worldwide Intelligence Network Environment (WINE). I find that addressing corruption in Eastern Europe or computer piracy in Sub-Saharan Africa has the potential to reduce the global cyber crime. The second part of the thesis focuses on cyber and biological weapon capabilities. I develop two computational methodologies: one to assess countries’ biological capabilities and one to assess countries’ cyber capabilities. The methodologies examine all countries in the world and can be used by non-experts that only have access to publicly available data. I validate the biological weapon assessment methodology by comparing the methodology’s assessment to historical data. This work has the potential to proactively reduce the global cyber and biological weapon threat.

Bradley Schmerl, Jeff Gennari, David Garlan.  2015.  An Architecture Style for Android Security Analysis. HotSoS '15 Proceedings of the 2015 Symposium and Bootcamp on the Science of Security.

Modern frameworks are required to be extendable as well as secure. However, these two qualities are often at odds. In this poster we describe an approach that uses a combination of static analysis and run-time management, based on software architecture models, that can improve security while maintaining framework extendability.

Alireza Sadeghi, Hamid Bagheri, Sam Malek.  2015.  Analysis of Android Inter-App Security Vulnerabilities Using COVERT. ICSE '15 Proceedings of the 37th International Conference on Software Engineering. 2

The state-of-the-art in securing mobile software systems are substantially intended to detect and mitigate vulnerabilities in a single app, but fail to identify vulnerabilities that arise due to the interaction of multiple apps, such as collusion attacks and privilege escalation chaining, shown to be quite common in the apps on the market. This paper demonstrates COVERT, a novel approach and accompanying tool-suite that relies on a hybrid static analysis and lightweight formal analysis technique to enable compositional security assessment of complex software. Through static analysis of Android application packages, it extracts relevant security specifications in an analyzable formal specification language, and checks them as a whole for inter-app vulnerabilities. To our knowledge, COVERT is the first formally-precise analysis tool for automated compositional analysis of Android apps. Our study of hundreds of Android apps revealed dozens of inter-app vulnerabilities, many of which were previously unknown. A video highlighting the main features of the tool can be found at: http://youtu.be/bMKk7OW7dGg.

2016-02-10
Cyrus Omar, Chenglong Wang, Jonathan Aldrich.  2015.  Composable and Hygienic Typed Syntax Macros. Symposium on Applied Computing (SAC).

Syntax extension mechanisms are powerful, but reasoning about syntax extensions can be difficult. Recent work on type-specific languages (TSLs) addressed reasoning about composition, hygiene and typing for extensions introducing new literal forms. We supplement TSLs with typed syntax macros (TSMs), which, unlike TSLs, are explicitly invoked to give meaning to delimited segments of arbitrary syntax. To maintain a typing discipline, we describe two avors of term-level TSMs: synthetic TSMs specify the type of term that they generate, while analytic TSMs can generate terms of arbitrary type, but can only be used in positions where the type is otherwise known. At the level of types, we describe a third avor of TSM that generates a type of a specified kind along with its TSL and show interesting use cases where the two mechanisms operate in concert.