Biblio
Requirements analysts can model regulated data practices to identify and reason about risks of noncompliance. If terminology is inconsistent or ambiguous, however, these models and their conclusions will be unreliable. To study this problem, we investigated an approach to automatically construct an information type ontology by identifying information type hyponymy in privacy policies using Tregex patterns. Tregex is a utility to match regular expressions against constituency parse trees, which are hierarchical expressions of natural language clauses, including noun and verb phrases. We discovered the Tregex patterns by applying content analysis to 30 privacy policies from six domains (shopping, telecommunication, social networks, employment, health, and news.) From this dataset, three semantic and four lexical categories of hyponymy emerged based on category completeness and wordorder. Among these, we identified and empirically evaluated 72 Tregex patterns to automate the extraction of hyponyms from privacy policies. The patterns match information type hyponyms with an average precision of 0.72 and recall of 0.74.
Information security can benefit from real-time cyber threat indicator sharing, in which companies and government agencies share their knowledge of emerging cyberattacks to benefit their sector and society at large. As attacks become increasingly sophisticated by exploiting behavioral dimensions of human computer operators, there is an increased risk to systems that store personal information. In addition, risk increases as individuals blur the boundaries between workplace and home computing (e.g., using workplace computers for personal reasons). This paper describes an architecture to leverage individual perceptions of privacy risk to compute privacy risk scores over cyber threat indicator data. Unlike security risk, which is a risk to a particular system, privacy risk concerns an individual's personal information being accessed and exploited. The architecture integrates tools to extract information entities from textual threat reports expressed in the STIX format and privacy risk estimates computed using factorial vignettes to survey individual risk perceptions. The architecture aims to optimize for scalability and adaptability to achieve real-time risk scoring.
Choosing how to write natural language scenarios is challenging, because stakeholders may over-generalize their descriptions or overlook or be unaware of alternate scenarios. In security, for example, this can result in weak security constraints that are too general, or missing constraints. Another challenge is that analysts are unclear on where to stop generating new scenarios. In this paper, we introduce the Multifactor Quality Method (MQM) to help requirements analysts to empirically collect system constraints in scenarios based on elicited expert preferences. The method combines quantitative statistical analysis to measure system quality with qualitative coding to extract new requirements. The method is bootstrapped with minimal analyst expertise in the domain affected by the quality area, and then guides an analyst toward selecting expert-recommended requirements to monotonically increase system quality. We report the results of applying the method to security. This include 550 requirements elicited from 69 security experts during a bootstrapping stage, and subsequent evaluation of these results in a verification stage with 45 security experts to measure the overall improvement of the new requirements. Security experts in our studies have an average of 10 years of experience. Our results show that using our method, we detect an increase in the security quality ratings collected in the verification stage. Finally, we discuss how our proposed method helps to improve security requirements elicitation, analysis, and measurement.
Mobile applications frequently access sensitive personal information to meet user or business requirements. Because such information is sensitive in general, regulators increasingly require mobile-app developers to publish privacy policies that describe what information is collected. Furthermore, regulators have fined companies when these policies are inconsistent with the actual data practices of mobile apps. To help mobile-app developers check their privacy policies against their apps' code for consistency, we propose a semi-automated framework that consists of a policy terminology-API method map that links policy phrases to API methods that produce sensitive information, and information flow analysis to detect misalignments. We present an implementation of our framework based on a privacy-policy-phrase ontology and a collection of mappings from API methods to policy phrases. Our empirical evaluation on 477 top Android apps discovered 341 potential privacy policy violations.
Requirements analysts can model regulated data practices to identify and reason about risks of noncompliance. If terminology is inconsistent or ambiguous, however, these models and their conclusions will be unreliable. To study this problem, we investigated an approach to automatically construct an information type ontology by identifying information type hyponymy in privacy policies using Tregex patterns. Tregex is a utility to match regular expressions against constituency parse trees, which are hierarchical expressions of natural language clauses, including noun and verb phrases. We discovered the Tregex patterns by applying content analysis to 15 privacy policies from three domains (shopping, telecommunication and social networks) to identify all instances of information type hyponymy. From this dataset, three semantic and four syntactic categories of hyponymy emerged based on category completeness and word-order. Among these, we identified and empirically evaluated 26 Tregex patterns to automate the extraction of hyponyms from privacy policies. The patterns identify information type hypernym-hyponym pairs with an average precision of 0.83 and recall of 0.52 across our dataset of 15 policies.
Ambiguity arises in requirements when astatement is unintentionally or otherwise incomplete, missing information, or when a word or phrase has morethan one possible meaning. For web-based and mobileinformation systems, ambiguity, and vagueness inparticular, undermines the ability of organizations to aligntheir privacy policies with their data practices, which canconfuse or mislead users thus leading to an increase inprivacy risk. In this paper, we introduce a theory ofvagueness for privacy policy statements based on ataxonomy of vague terms derived from an empiricalcontent analysis of 15 privacy policies. The taxonomy wasevaluated in a paired comparison experiment and resultswere analyzed using the Bradley-Terry model to yield arank order of vague terms in both isolation andcomposition. The theory predicts how vague modifiers toinformation actions and information types can becomposed to increase or decrease overall vagueness. Wefurther provide empirical evidence based on factorialvignette surveys to show how increases in vagueness willdecrease users' acceptance of privacy risk and thusdecrease users' willingness to share personal information.
Security analysis requires specialized knowledge to align threats and vulnerabilities in information technology. To identify mitigations, analysts need to understand how threats, vulnerabilities, and mitigations are composed together to yield security requirements. Despite abundant guidance in the form of checklists and controls about how to secure systems, evidence suggests that security experts do not apply these checklists. Instead, they rely on their prior knowledge and experience to identify security vulnerabilities. To better understand the different effects of checklists, design analysis, and expertise, we conducted a series of interviews to capture and encode the decisionmaking process of security experts and novices during three security analysis exercises. Participants were asked to analyze three kinds of artifacts: source code, data flow diagrams, and network diagrams, for vulnerabilities, and then to apply a requirements checklist to demonstrate their ability to mitigate vulnerabilities. We framed our study using Situation Awareness, which is a theory about human perception that was used to elicit interviewee responses. The responses were then analyzed using coding theory and grounded analysis. Our results include decision-making patterns that characterize how analysts perceive, comprehend, and project future threats against a system, and how these patterns relate to selecting security mitigations. Based on this analysis, we discovered new theory to measure how security experts and novices apply attack models and how structured and unstructured analysis enables increasing security requirements coverage. We highlight the role of expertise level and requirements composition in affecting security decision-making and we discuss how our method produced new hypotheses about security analysis and decisionmaking.
As information systems become increasingly interdependent, there is an increased need to share cybersecurity data across government agencies and companies, and within and across industrial sectors. This sharing includes threat, vulnerability and incident reporting data, among other data. For cyberattacks that include sociotechnical vectors, such as phishing or watering hole attacks, this increased sharing could expose customer and employee personal data to increased privacy risk. In the US, privacy risk arises when the government voluntarily receives data from companies without meaningful consent from individuals, or without a lawful procedure that protects an individual's right to due process. In this paper, we describe a study to examine the trade-off between the need for potentially sensitive data, which we call incident data usage, and the perceived privacy risk of sharing that data with the government. The study is comprised of two parts: a data usage estimate built from a survey of 76 security professionals with mean eight years' experience; and a privacy risk estimate that measures privacy risk using an ordinal likelihood scale and nominal data types in factorial vignettes. The privacy risk estimate also factors in data purposes with different levels of societal benefit, including terrorism, imminent threat of death, economic harm, and loss of intellectual property. The results show which data types are high-usage, low-risk versus those that are low-usage, high-risk. We discuss the implications of these results and recommend future work to improve privacy when data must be shared despite the increased risk to privacy.
Privacy policies are used to communicate company data practices to consumers and must be accurate and comprehensive. Each policy author is free to use their own nomenclature when describing data practices, which leads to different ways in which similar information types are described across policies. A formal ontology can help policy authors, users and regulators consistently check how data practice descriptions relate to other interpretations of information types. In this paper, we describe an empirical method for manually constructing an information type ontology from privacy policies. The method consists of seven heuristics that explain how to infer hypernym, meronym and synonym relationships from information type phrases, which we discovered using grounded analysis of five privacy policies. The method was evaluated on 50 mobile privacy policies which produced an ontology consisting of 355 unique information type names. Based on the manual results, we describe an automated technique consisting of 14 reusable semantic rules to extract hypernymy, meronymy, and synonymy relations from information type phrases. The technique was evaluated on the manually constructed ontology to yield .95 precision and .51 recall.
Organizations rely on security experts to improve the security of their systems. These professionals use background knowledge and experience to align known threats and vulnerabilities before selecting mitigation options. The substantial depth of expertise in any one area (e.g., databases, networks, operating systems) precludes the possibility that an expert would have complete knowledge about all threats and vulnerabilities. To begin addressing this problem of distributed knowledge, we investigate the challenge of developing a security requirements rule base that mimics human expert reasoning to enable new decision-support systems. In this paper, we show how to collect relevant information from cyber security experts to enable the generation of: (1) interval type-2 fuzzy sets that capture intra- and inter-expert uncertainty around vulnerability levels; and (2) fuzzy logic rules underpinning the decision-making process within the requirements analysis. The proposed method relies on comparative ratings of security requirements in the context of concrete vignettes, providing a novel, interdisciplinary approach to knowledge generation for fuzzy logic systems. The proposed approach is tested by evaluating 52 scenarios with 13 experts to compare their assessments to those of the fuzzy logic decision support system. The initial results show that the system provides reliable assessments to the security analysts, in particular, generating more conservative assessments in 19% of the test scenarios compared to the experts’ ratings.
Security analysis requires some degree of knowledge to align threats to vulnerabilities in information technology. Despite the abundance of security requirements, the evidence suggests that security experts are not applying these checklists. Instead, they default to their background knowledge to identify security vulnerabilities. To better understand the different effects of security checklists, analysis and expertise, we conducted a series of interviews to capture and encode the decisionmaking process of security experts and novices during three security requirements analysis exercises. Participants were asked to analyze three kinds of artifacts: source code, data flow diagrams, and network diagrams, for vulnerabilities, and then to apply a requirements checklist to demonstrate their ability to mitigate vulnerabilities. We framed our study using Situation Awareness theory to elicit responses that were analyzed using coding theory and grounded analysis. Our results include decision-making patterns that characterize how analysts perceive, comprehend and project future threats, and how these patterns relate to selecting security mitigations. Based on this analysis, we discovered new theory to measure how security experts and novices apply attack models and how structured and unstructured analysis enables increasing security requirements coverage. We discuss suggestions of how our method could be adapted and applied to improve training and education instruments of security analysts.
Security requirements analysis depends on how well-trained analysts perceive security risk, understand the impact of various vulnerabilities, and mitigate threats. When systems are composed of multiple machines, configurations, and software components that interact with each other, risk perception must account for the composition of security requirements. In this paper, we report on how changes to security requirements affect analysts risk perceptions and their decisions about how to modify the requirements to reach adequate security levels. We conducted two user surveys of 174 participants wherein participants assess security levels across 64 factorial vignettes. We analyzed the survey results using multi-level modeling to test for the effect of security requirements composition on participants’ overall security adequacy ratings and on their ratings of individual requirements. We accompanied this analysis with grounded analysis of elicited requirements aimed at lowering the security risk. Our results suggest that requirements composition affects experts’ adequacy ratings on security requirements. In addition, we identified three categories of requirements modifications, called refinements, replacements and reinforcements, and we measured how these categories compare with overall perceived security risk. Finally, we discuss the future impact of our work in security requirements assessment practice.
Mobile and web applications increasingly leverage service-oriented architectures in which developers integrate third-party services into end user applications. This includes identity management, mapping and navigation, cloud storage, and advertising services, among others. While service reuse reduces development time, it introduces new privacy and security risks due to data repurposing and over-collection as data is shared among multiple parties who lack transparency into third-party data practices. To address this challenge, we propose new techniques based on Description Logic (DL) for modeling multi-party data flow requirements and verifying the purpose specification and collection and use limitation principles, which are prominent privacy properties found in international standards and guidelines. We evaluate our techniques in an empirical case study that examines the data practices of the Waze mobile application and three of their service providers: Facebook Login, Amazon Web Services (a cloud storage provider), and Flurry.com (a popular mobile analytics and advertising platform). The study results include detected conflicts and violations of the principles as well as two patterns for balancing privacy and data use flexibility in requirements specifications. Analysis of automation reasoning over the DL models show that reasoning over complex compositions of multi-party systems is feasible within exponential asymptotic timeframes proportional to the policy size, the number of expressed data, and orthogonal to the number of conflicts found.
Mobile and web applications increasingly leverage service-oriented architectures in which developers integrate third-party services into end user applications. This includes identity management, mapping and navigation, cloud storage, and advertising services, among others. While service reuse reduces development time, it introduces new privacy and security risks due to data repurposing and over-collection as data is shared among multiple parties who lack transparency into thirdparty data practices. To address this challenge, we propose new techniques based on Description Logic (DL) for modeling multiparty data flow requirements and verifying the purpose specification and collection and use limitation principles, which are prominent privacy properties found in international standards and guidelines. We evaluate our techniques in an empirical case study that examines the data practices of the Waze mobile application and three of their service providers: Facebook Login, Amazon Web Services (a cloud storage provider), and Flurry.com (a popular mobile analytics and advertising platform). The study results include detected conflicts and violations of the principles as well as two patterns for balancing privacy and data use flexibility in requirements specifications. Analysis of automation reasoning over the DL models show that reasoning over complex compositions of multi-party systems is feasible within exponential asymptotic timeframes proportional to the policy size, the number of expressed data, and orthogonal to the number of conflicts found.
ContextSoftware patterns encapsulate expert knowledge for constructing successful solutions to recurring problems. Although a large collection of software patterns is available in literature, empirical evidence on how well various patterns help in problem solving is limited and inconclusive. The context of these empirical findings is also not well understood, limiting applicability and generalizability of the findings. ObjectiveTo characterize the research design of empirical studies exploring software pattern application involving human participants. MethodWe conducted a systematic mapping study to identify and analyze 30 primary empirical studies on software pattern application, including 24 original studies and 6 replications. We characterize the research design in terms of the questions researchers have explored and the context of empirical research efforts. We also classify the studies in terms of measures used for evaluation, and threats to validity considered during study design and execution. ResultsUse of software patterns in maintenance is the most commonly investigated theme, explored in 16 studies. Object-oriented design patterns are evaluated in 14 studies while 4 studies evaluate architectural patterns. We identified 10 different constructs with 31 associated measures used to evaluate software patterns. Measures for 'efficiency' and 'usability' are commonly used to evaluate the problem solving process. While measures for 'completeness', 'correctness' and 'quality' are commonly used to evaluate the final artifact. Overall, 'time to complete a task' is the most frequently used measure, employed in 15 studies to measure 'efficiency'. For qualitative measures, studies do not report approaches for minimizing biases 27% of the time. Nine studies do not discuss any threats to validity. ConclusionSubtle differences in study design and execution can limit comparison of findings. Establishing baselines for participants' experience level, providing appropriate training, standardizing problem sets, and employing commonly used measures to evaluate performance can support replication and comparison of results across studies.
As information security became an increasing concern for software developers and users, requirements engineering (RE) researchers brought new insight to security requirements. Security requirements aim to address security at the early stages of system design while accommodating the complex needs of different stakeholders. Meanwhile, other research communities, such as usable privacy and security, have also examined these requirements with specialized goal to make security more usable for stakeholders from product owners, to system users and administrators. In this paper we report results from conducting a literature survey to compare security requirements research from RE Conferences with the Symposium on Usable Privacy and Security (SOUPS). We report similarities between the two research areas, such as common goals, technical definitions, research problems, and directions. Further, we clarify the differences between these two communities to understand how they can leverage each other’s insights. From our analysis, we recommend new directions in security requirements research mainly to expand the meaning of security requirements in RE to reflect the technological advancements that the broader field of security is experiencing. These recommendations to encourage crosscollaboration with other communities are not limited to the security requirements area; in fact, we believe they can be generalized to other areas of RE.
Information system developers and administrators often overlook critical security requirements and best practices. This may be due to lack of tools and techniques that allow practitioners to tailor security knowledge to their particular context. In order to explore the impact of new security methods, we must improve our ability to study the impact of security tools and methods on software and system development. In this paper, we present early findings of an experiment to assess the extent to which the number and type of examples used in security training stimuli can impact security problem solving. To motivate this research, we formulate hypotheses from analogical transfer theory in psychology. The independent variables include number of problem surfaces and schemas, and the dependent variable is the answer accuracy. Our study results do not show a statistically significant difference in performance when the number and types of examples are varied. We discuss the limitations, threats to validity and opportunities for future studies in this area.
Research shows that commonly accepted security requirements are not generally applied in practice. Instead of relying on requirements checklists, security experts rely on their expertise and background knowledge to identify security vulnerabilities. To understand the gap between available checklists and practice, we conducted a series of interviews to encode the decision-making process of security experts and novices during security requirements analysis. Participants were asked to analyze two types of artifacts: source code, and network diagrams for vulnerabilities and to apply a requirements checklist to mitigate some of those vulnerabilities. We framed our study using Situation Awareness—a cognitive theory from psychology—to elicit responses that we later analyzed using coding theory and grounded analysis. We report our preliminary results of analyzing two interviews that reveal possible decision- making patterns that could characterize how analysts perceive, comprehend and project future threats which leads them to decide upon requirements and their specifications, in addition, to how experts use assumptions to overcome ambiguity in specifications. Our goal is to build a model that researchers can use to evaluate their security requirements methods against how experts transition through different situation awareness levels in their decision-making process.
Security requirements patterns represent reusable security practices that software engineers can apply to improve security in their system. Reusing best practices that others have employed could have a number of benefits, such as decreasing the time spent in the requirements elicitation process or improving the quality of the product by reducing product failure risk. Pattern selection can be difficult due to the diversity of applicable patterns from which an analyst has to choose. The challenge is that identifying the most appropriate pattern for a situation can be cumbersome and time-consuming. We propose a new method that combines an inquiry-cycle based approach with the feature diagram notation to review only relevant patterns and quickly select the most appropriate patterns for the situation. Similar to patterns themselves, our approach captures expert knowledge to relate patterns based on decisions made by the pattern user. The resulting pattern hierarchies allow users to be guided through these decisions by questions, which introduce related patterns in order to help the pattern user select the most appropriate patterns for their situation, thus resulting in better requirement generation. We evaluate our approach using access control patterns in a pattern user study.
Despite the abundance of information security guidelines, system developers have difficulties implementing technical solutions that are reasonably secure. Security patterns are one possible solution to help developers reuse security knowledge. The challenge is that it takes experts to develop security patterns. To address this challenge, we need a framework to identify and assess patterns and pattern application practices that are accessible to non-experts. In this paper, we narrowly define what we mean by patterns by focusing on requirements patterns and the considerations that may inform how we identify and validate patterns for knowledge reuse. We motivate this discussion using examples from the requirements pattern literature and theory in cognitive psychology.