Biblio
Security practitioners use the attack surface of software systems to prioritize areas of systems to test and analyze. To date, approaches for predicting which code artifacts are vulnerable have utilized a binary classification of code as vulnerable or not vulnerable. To better understand the strengths and weaknesses of vulnerability prediction approaches, vulnerability datasets with classification and severity data are needed. The goal of this paper is to help researchers and practitioners make security effort prioritization decisions by evaluating which classifications and severities of vulnerabilities are on an attack surface approximated using crash dump stack traces. In this work, we use crash dump stack traces to approximate the attack surface of Mozilla Firefox. We then generate a dataset of 271 vulnerable files in Firefox, classified using the Common Weakness Enumeration (CWE) system. We use these files as an oracle for the evaluation of the attack surface generated using crash data. In the Firefox vulnerability dataset, 14 different classifications of vulnerabilities appeared at least once. In our study, 85.3%
of vulnerable files were on the attack surface generated using crash data. We found no difference between the severity of vulnerabilities found on the attack surface generated using crash data and vulnerabilities not occurring on the attack surface. Additionally, we discuss lessons learned during the development of this vulnerability dataset.
Scientific advancement is fueled by solid fundamental research, followed by replication, meta-analysis, and theory building. To support such advancement, researchers and government agencies have been working towards a "science of security". As in other sciences, security science requires high-quality fundamental research addressing important problems and reporting approaches that capture the information necessary for replication, meta-analysis, and theory building. The goal of this paper is to aid security researchers in establishing a baseline of the state of scientific reporting in security through an analysis of indicators of scientific research as reported in top security conferences, specifically the 2015 ACM CCS and 2016 IEEE S&P proceedings. To conduct this analysis, we employed a series of rubrics to analyze the completeness of information reported in papers relative to the type of evaluation used (e.g. empirical study, proof, discussion). Our findings indicated some important information is often missing from papers, including explicit documentation of research objectives and the threats to validity. Our findings show a relatively small number of replications reported in the literature. We hope that this initial analysis will serve as a baseline against which we can measure the advancement of the science of security.
Proactive security reviews and test efforts are a necessary component of the software development lifecycle. Resource limitations often preclude reviewing the entire code
base. Making informed decisions on what code to review can improve a team’s ability to find and remove vulnerabilities. Risk-based attack surface approximation (RASA) is a technique that uses crash dump stack traces to predict what code may contain exploitable vulnerabilities. The goal of this research is to help software development teams prioritize security efforts by the efficient development of a risk-based attack surface approximation. We explore the use of RASA using Mozilla Firefox and Microsoft Windows stack traces from crash dumps. We create RASA at the file level for Firefox, in which the 15.8% of the files that were part of the approximation contained 73.6% of the vulnerabilities seen for the product. We also explore the effect of random sampling of crashes on the approximation, as it may be impractical for organizations to store and process every crash received. We find that 10-fold random sampling of crashes at a rate of 10% resulted in 3% less vulnerabilities identified than using the entire set of stack traces for Mozilla Firefox. Sampling crashes in Windows 8.1 at a rate of 40% resulted in insignificant differences in vulnerability and file coverage as compared to a rate of 100%.
Android applications pose security and privacy risks for end-users. These risks are often quantified by performing dynamic analysis and permission analysis of the Android applications after release. Prediction of security and privacy risks associated with Android applications at early stages of application development, e.g. when the developer (s) are
writing the code of the application, might help Android application developers in releasing applications to end-users that have less security and privacy risk. The goal of this paper
is to aid Android application developers in assessing the security and privacy risk associated with Android applications by using static code metrics as predictors. In our paper, we consider security and privacy risk of Android application as how susceptible the application is to leaking private information of end-users and to releasing vulnerabilities. We investigate how effectively static code metrics that are extracted from the source code of Android applications, can be used to predict security and privacy risk of Android applications. We collected 21 static code metrics of 1,407 Android applications, and use the collected static code metrics to predict security and privacy risk of the applications. As the oracle of security and privacy risk, we used Androrisk, a tool that quantifies the amount of security and privacy risk of an Android application using analysis of Android permissions and dynamic analysis. To accomplish our goal, we used statistical learners such as, radial-based support vector machine (r-SVM). For r-SVM, we observe a precision of 0.83. Findings from our paper suggest that with proper selection of static code metrics, r-SVM can be used effectively to predict security and privacy risk of Android applications
Social network services enable users to conveniently share personal information. Often, the information shared concerns other people, especially other members of the social network service. In such situations, two or more people can have conflicting privacy preferences; thus, an appropriate sharing policy may not be apparent. We identify such situations as multiuser privacy scenarios. Current approaches propose finding a sharing policy through preference aggregation. However, studies suggest that users feel more confident in their decisions regarding sharing when they know the reasons behind each other's preferences. The goals of this paper are (1) understanding how people decide the appropriate sharing policy in multiuser scenarios where arguments are employed, and (2) developing a computational model to predict an appropriate sharing policy for a given scenario. We report on a study that involved a survey of 988 Amazon MTurk users about a variety of multiuser scenarios and the optimal sharing policy for each scenario. Our evaluation of the participants' responses reveals that contextual factors, user preferences, and arguments influence the optimal sharing policy in a multiuser scenario. We develop and evaluate an inference model that predicts the optimal sharing policy given the three types of features. We analyze the predictions of our inference model to uncover potential scenario types that lead to incorrect predictions, and to enhance our understanding of when multiuser scenarios are more or less prone to dispute.
To appear
The ineffectiveness of phishing warnings has been attributed to users' poor comprehension of the warning. However, the effectiveness of a phishing warning is typically evaluated at the time when users interact with a suspected phishing webpage, which we call the effect with phishing warning. Nevertheless, users' improved phishing detection when the warning is absent—or the effect of the warning—is the ultimate goal to prevent users from falling for phishing scams. We conducted an online study to evaluate the effect with and of several phishing warning variations, varying the point at which the warning was presented and whether procedural knowledge instruction was included in the warning interface. The current Chrome phishing warning was also included as a control. 360 Amazon Mechanical-Turk workers made submission; 500¬ word maximum for symposia) decisions about 10 login webpages (8 authentic, 2 fraudulent) with the aid of warning (first phase). After a short distracting task, the workers made the same decisions about 10 different login webpages (8 authentic, 2 fraudulent) without warning. In phase one, the compliance rates with two proposed warning interfaces (98% and 94%) were similar to those of the Chrome warning (98%), regardless of when the warning was presented. In phase two (without warning), performance was better for the condition in which warning with procedural knowledge instruction was presented before the phishing webpage in phase one, suggesting a better of effect than for the other conditions. With the procedural knowledge of how to determine a webpage’s legitimacy, users identified phishing webpages more accurately even without the warning being presented.
Many previous studies have shown that trust in automation mediates the effectiveness of automation in maintaining performance, and one critical factor that affects trust is the reliability of the automated system. In the cyber domain, automated systems are pervasive, yet the involvement of human trust has not been studied extensively as in other domains such as transportation.
In the current study, we used a phishing email identification task (with a phishing detection automated assistant system) as a testbed to study human trust in automation in the cyber domain. More specifically, we systematically investigated the influence of “description” (i.e., whether the user was informed about the actual reliability of the automated system) and “experience” (i.e., whether the user was provided feedback on their choices), in addition to the reliability level of the automated phishing detection system. These factors were varied in different conditions of response bias (false alarm vs. misses) and task difficulty (easy vs. difficult), which were found may be critical in a pilot study. Measures of user performance and trust were compared across different conditions. The measures of interest were human trust in the warning (a subjective rating of how trustable the warning system is), human reliance on the automated system (an objective measure of whether the participants comply with the system’s warnings), and performance (the overall quality of the decisions made).
Objective: To evaluate the effectiveness of domain highlighting in helping users identify whether webpages are legitimate or spurious.
Background: As a component of the URL, a domain name can be overlooked. Consequently, browsers highlight the domain name to help users identify which website they are visiting. Nevertheless, few studies have assessed the effectiveness of domain highlighting, and the only formal study confounded highlighting with instructions to look at the address bar.
Method: We conducted two phishing detection experiments. Experiment 1 was run online: Participants judged the legitimacy of webpages in two phases. In phase one, participants were to judge the legitimacy based on any information on the webpage, whereas phase two they were to focus on the address bar. Whether the domain was highlighted was also varied. Experiment 2 was conducted similarly but with participants in a laboratory setting, which allowed tracking of fixations.
Results: Participants differentiated the legitimate and fraudulent webpages better than chance. There was some benefit of attending to the address bar, but domain highlighting did not provide effective protection against phishing attacks. Analysis of eye-gaze fixation measures was in agreement with the task performance, but heat-map results revealed that participants’ visual attention was attracted by the highlighted domains.
Conclusion: Failure to detect many fraudulent webpages even when the domain was highlighted implies that users lacked knowledge of webpage security cues or how to use those cues.
The ineffectiveness of phishing warnings has been attributed to users' poor comprehension of the warning. However, the effectiveness of a phishing warning is typically evaluated at the time when users interact with a suspected phishing webpage, which we call the effect with phishing warning. Nevertheless, users' improved phishing detection when the warning is absent—or the effect of the warning—is the ultimate goal to prevent users from falling for phishing scams. We conducted an online study to evaluate the effect with and of several phishing warning variations, varying the point at which the warning was presented and whether procedural knowledge instruction was included in the warning interface. The current Chrome phishing warning was also included as a control. 360 Amazon Mechanical-Turk workers made submission; 500¬ word maximum for symposia) decisions about 10 login webpages (8 authentic, 2 fraudulent) with the aid of warning (first phase). After a short distracting task, the workers made the same decisions about 10 different login webpages (8 authentic, 2 fraudulent) without warning. In phase one, the compliance rates with two proposed warning interfaces (98% and 94%) were similar to those of the Chrome warning (98%), regardless of when the warning was presented. In phase two (without warning), performance was better for the condition in which warning with procedural knowledge instruction was presented before the phishing webpage in phase one, suggesting a better of effect than for the other conditions. With the procedural knowledge of how to determine a webpage’s legitimacy, users identified phishing webpages more accurately even without the warning being presented.
Smartphone users often use private and enterprise data with untrusted third party applications. The fundamental lack of secrecy guarantees in smartphone OSes, such as Android, exposes this data to the risk of unauthorized exfiltration. A natural solution is the integration of secrecy guarantees into the OS. In this paper, we describe the challenges for decentralized information flow control (DIFC) enforcement on Android. We propose context-sensitive DIFC enforcement via lazy polyinstantiation and practical and secure network export through domain declassification. Our DIFC system, Weir, is backwards compatible by design, and incurs less than 4 ms overhead for component startup. With Weir, we demonstrate practical and secure DIFC enforcement on Android.
In a multiagent system, a (social) norm describes what the agents may expect from each other. Norms promote autonomy (an agent need not comply with a norm) and heterogeneity (a norm describes interactions at a high level independent of implementation details). Researchers have studied norm emergence through social learning where the agents interact repeatedly in a graph structure.
In contrast, we consider norm emergence in an open system, where membership can change, and where no predetermined graph structure exists. We propose Silk, a mechanism wherein a generator monitors interactions among member agents and recommends norms to help resolve conflicts. Each member decides on whether to accept or reject a recommended norm. Upon exiting the system, a member passes its experience along to incoming members of the same type. Thus, members develop norms in a hybrid manner to resolve conflicts.
We evaluate Silk via simulation in the traffic domain. Our results show that social norms promoting conflict resolution emerge in both moderate and selfish societies via our hybrid mechanism.
To interact effectively, agents must enter into commitments. What should an agent do when these commitments conflict? We describe Coco, an approach for reasoning about which specific commitments apply to specific parties in light of general types of commitments, specific circumstances, and dominance relations among specific commitments. Coco adapts answer-set programming to identify a maximalsetofnondominatedcommitments. It provides a modeling language and tool geared to support practical applications.
Firewall policies are notorious for having misconfiguration errors which can defeat its intended purpose of protecting hosts in the network from malicious users. We believe this is because today's firewall policies are mostly monolithic. Inspired by ideas from modular programming and code refactoring, in this work we introduce three kinds of modules: primary, auxiliary, and template, which facilitate the refactoring of a firewall policy into smaller, reusable, comprehensible, and more manageable components. We present algorithms for generating each of the three modules for a given legacy firewall policy. We also develop ModFP, an automated tool for converting legacy firewall policies represented in access control list to their modularized format. With the help of ModFP, when examining several real-world policies with sizes ranging from dozens to hundreds of rules, we were able to identify subtle errors.
To help establish a more scientific basis for security science, which will enable the development of fundamental theories and move the field from being primarily reactive to primarily proactive, it is important for research results to be reported in a scientifically rigorous manner. Such reporting will allow for the standard pillars of science, namely replication, meta-analysis, and theory building. In this paper we aim to establish a baseline of the state of scientific work in security through the analysis of indicators of scientific research as reported in the papers from the 2015 IEEE Symposium on Security and Privacy. To conduct this analysis, we developed a series of rubrics to determine the completeness of the papers relative to the type of evaluation used (e.g. case study, experiment, proof). Our findings showed that while papers are generally easy to read, they often do not explicitly document some key information like the research objectives, the process for choosing the cases to include in the studies, and the threats to validity. We hope that this initial analysis will serve as a baseline against which we can measure the advancement of the science of security.
Firewall policies are notorious for having misconfiguration errors which can defeat its intended purpose of protecting hosts in the network from malicious users. We believe this is because today's firewall policies are mostly monolithic. Inspired by ideas from modular programming and code refactoring, in this work we introduce three kinds of modules: primary, auxiliary, and template, which facilitate the refactoring of a firewall policy into smaller, reusable, comprehensible, and more manageable components. We present algorithms for generating each of the three modules for a given legacy firewall policy. We also develop ModFP, an automated tool for converting legacy firewall policies represented in access control list to their modularized format. With the help of ModFP, when examining several real-world policies with sizes ranging from dozens to hundreds of rules, we were able to identify subtle errors.
Eight hundred eighty-seven phishing emails from Arizona State University, Brown University, and Cornell University were assessed by two reviewers for Cialdini’s six principles of persuasion: authority, social proof, liking/similarity, commitment/consistency, scarcity, and reciprocation. A correlational analysis of email characteristics by year revealed that the persuasion principles of commitment/consistency and scarcity have increased over time, while the principles of reciprocation and social proof have decreased over time. Authority and liking/similarity revealed mixed results with certain characteristics increasing and others decreasing. Results from this study can inform user training of phishing emails and help cybersecurity software to become more effective.
Platform as a Service (PaaS) provides middleware resources to cloud customers. As demand for PaaS services increases, so do concerns about the security of PaaS. This paper discusses principal PaaS security and integrity requirements, and vulnerabilities and the corresponding countermeasures. We consider three core cloud elements: multi-tenancy, isolation, and virtualization and how they relate to PaaS services and security trends and concerns such as user and resource isolation, side-channel vulnerabilities in multi-tenant environments, and protection of sensitive data
Signals intelligence analysts play a critical role in the United States government by providing information regarding potential national security threats to government leaders. Analysts perform complex decision-making tasks that involve gathering, sorting, and analyzing information. The current study evaluated how individual differences and training influence performance on an Internet search-based medical diagnosis task designed to simulate a signals analyst task. The implemented training emphasized the extraction and organization of relevant information and deductive reasoning. The individual differences of interest included working memory capacity and previous experience with elements of the task, specifically health literacy, prior experience using the Internet, and prior experience conducting Internet searches. Preliminary results indicated that the implemented training did not significantly affect performance, however, working memory significantly predicted performance on the implemented task. These results support previous research and provide additional evidence that working memory capacity influences performance on cognitively complex decision-making tasks, whereas experience with elements of the task may not. These findings suggest that working memory capacity should be considered when screening individuals for signals intelligence positions. Future research should aim to generalize these findings within a broader sample, and ideally utilize a task that directly replicates those performed by signals analysts.
Growing traffic volumes and the increasing complexity of attacks pose a constant scaling challenge for network intrusion prevention systems (NIPS). In this respect, offloading NIPS processing to compute clusters offers an immediately deployable alternative to expensive hardware upgrades. In practice, however, NIPS offloading is challenging on three fronts in contrast to passive network security functions: (1) NIPS offloading can impact other traffic engineering objectives; (2) NIPS offloading impacts user perceived latency; and (3) NIPS actively change traffic volumes by dropping unwanted traffic. To address these challenges, we present the SNIPS system. We design a formal optimization framework that captures tradeoffs across scalability, network load, and latency. We provide a practical implementation using recent advances in software-defined networking without requiring modifications to NIPS hardware. Our evaluations on realistic topologies show that SNIPS can reduce the maximum load by up to 10× while only increasing the latency by 2%.