Biblio
The promise of big data relies on the release and aggregation of data sets. When these data sets contain sensitive information about individuals, it has been scalable and convenient to protect the privacy of these individuals by de-identification. However, studies show that the combination of de-identified data sets with other data sets risks re-identification of some records. Some studies have shown how to measure this risk in specific contexts where certain types of public data sets (such as voter roles) are assumed to be available to attackers. To the extent that it can be accomplished, such analyses enable the threat of compromises to be balanced against the benefits of sharing data. For example, a study that might save lives by enabling medical research may be enabled in light of a sufficiently low probability of compromise from sharing de-identified data. In this paper, we introduce a general probabilistic re-identification framework that can be instantiated in specific contexts to estimate the probability of compromises based on explicit assumptions. We further propose a baseline of such assumptions that enable a first-cut estimate of risk for practical case studies. We refer to the framework with these assumptions as the Naive Re-identification Framework (NRF). As a case study, we show how we can apply NRF to analyze and quantify the risk of re-identification arising from releasing de-identified medical data in the context of publicly-available social media data. The results of this case study show that NRF can be used to obtain meaningful quantification of the re-identification risk, compare the risk of different social media, and assess risks of combinations of various demographic attributes and medical conditions that individuals may voluntarily disclose on social media.
In context-aware applications, user's access privileges rely on both user's identity and context. Access control rules are usually statically defined while contexts and the system state can change dynamically. Changes in contexts can result in service disruptions. To address this issue, this poster proposes a reactive access control system that associates contingency plans with access control rules. Risk scores are also associated with actions part of the contingency plans. Such risks are estimated by using fuzzy inference. Our approach is cast into the XACML reference architecture.
The adoption of the HTTPS - i.e. HTTP over TLS - protocol by the Hellenic websites is studied in this work. Since this protocol constitutes a de-facto standard for secure communications in the web, our aim is to identify whether the underlying TLS protocol in popular websites in Greece is properly configured, so as to avoid known vulnerabilities. To this end, a systematic approach utilizing two well-known TLS scanner tools is adopted to evaluate 241 sites of high popularity. The results illustrate that only about half of the sites seem to be at a satisfactory level and, thus, there is still much room for improvement, mainly due to the fact that obsolete ciphers and/or protocol versions are still supported; there is also a small portion - i.e. about 3% of the sites - that do not implement the HTTPS at all, thus posing very high security risks for their users who provide their credentials via a totally insecure channel. We also examined, using an appropriate online questionnaire, whether the users are actually aware of what the HTTPS means and how they check the security of the websites. The outcome of this research shows that much work needs to be done to increase the knowledge and the security awareness of an average Internet user.
Despite more than a decade of heightened focus on cybersecurity, cyber threats remain an ongoing and growing concern [1]-[3]. Stakeholders often perform cyber risk assessments in order to understand potential mission impacts due to cyber threats. One common approach to cyber risk assessment is event-based analysis which usually considers adverse events, effects, and paths through a system, then estimates the effort/likelihood and mission impact of such attacks. When conducted manually, this type of approach is labor-intensive, subjective, and does not scale well to complex systems. As an alternative, we present an automated capability-based risk assessment approach, compare it to manual event-based analysis approaches, describe its application to a notional space system ground segment, and discuss the results.
Vulnerability exploitation is reportedly one of the main attack vectors against computer systems. Yet, most vulnerabilities remain unexploited by attackers. It is therefore of central importance to identify vulnerabilities that carry a high 'potential for attack'. In this paper we rely on Symantec data on real attacks detected in the wild to identify a trade-off in the Impact and Complexity of a vulnerability in terms of attacks that it generates; exploiting this effect, we devise a readily computable estimator of the vulnerability's Attack Potential that reliably estimates the expected volume of attacks against the vulnerability. We evaluate our estimator performance against standard patching policies by measuring foiled attacks and demanded workload expressed as the number of vulnerabilities entailed to patch. We show that our estimator significantly improves over standard patching policies by ruling out low-risk vulnerabilities, while maintaining invariant levels of coverage against attacks in the wild. Our estimator can be used as a first aid for vulnerability prioritisation to focus assessment efforts on high-potential vulnerabilities.
Circular statistics present a new technique to analyse the time patterns of events in the field of cyber security. We apply this technique to analyse incidents of malware infections detected by network monitoring. In particular we are interested in the daily and weekly variations of these events. Based on "live" data provided by Spamhaus, we examine the hypothesis that attacks on four countries are distributed uniformly over 24 hours. Specifically, we use Rayleigh and Watson tests. While our results are mainly exploratory, we are able to demonstrate that the attacks are not uniformly distributed, nor do they follow a Poisson distribution as reported in other research. Our objective in this is to identify a distribution that can be used to establish risk metrics. Moreover, our approach provides a visual overview of the time patterns' variation, indicating when attacks are most likely. This will assist decision makers in cyber security to allocate resources or estimate the cost of system monitoring during high risk periods. Our results also reveal that the time patterns are influenced by the total number of attacks. Networks subject to a large volume of attacks exhibit bimodality while one case, where attacks were at relatively lower rate, showed a multi-modal daily variation.
Denial-of-Service attacks have rapidly increased in terms of frequency and intensity, steadily becoming one of the biggest threats to Internet stability and reliability. However, a rigorous comprehensive characterization of this phenomenon, and of countermeasures to mitigate the associated risks, faces many infrastructure and analytic challenges. We make progress toward this goal, by introducing and applying a new framework to enable a macroscopic characterization of attacks, attack targets, and DDoS Protection Services (DPSs). Our analysis leverages data from four independent global Internet measurement infrastructures over the last two years: backscatter traffic to a large network telescope; logs from amplification honeypots; a DNS measurement platform covering 60% of the current namespace; and a DNS-based data set focusing on DPS adoption. Our results reveal the massive scale of the DoS problem, including an eye-opening statistic that one-third of all / 24 networks recently estimated to be active on the Internet have suffered at least one DoS attack over the last two years. We also discovered that often targets are simultaneously hit by different types of attacks. In our data, Web servers were the most prominent attack target; an average of 3% of the Web sites in .com, .net, and .org were involved with attacks, daily. Finally, we shed light on factors influencing migration to a DPS.
Nearly all modern software has security flaws–-either known or unknown by the users. However, metrics for evaluating software security (or lack thereof) are noisy at best. Common evaluation methods include counting the past vulnerabilities of the program, or comparing the size of the Trusted Computing Base (TCB), measured in lines of code (LoC) or binary size. Other than deleting large swaths of code from project, it is difficult to assess whether a code change decreased the likelihood of a future security vulnerability. Developers need a practical, constructive way of evaluating security. This position paper argues that we actually have all the tools needed to design a better, empirical method of security evaluation. We discuss related work that estimates the severity and vulnerability of certain attack vectors based on code properties that can be determined via static analysis. This paper proposes a grand, unified model that can predict the risk and severity of vulnerabilities in a program. Our prediction model uses machine learning to correlate these code features of open-source applications with the history of vulnerabilities reported in the CVE (Common Vulnerabilities and Exposures) database. Based on this model, one can incorporate an analysis into the standard development cycle that predicts whether the code is becoming more or less prone to vulnerabilities.
Each year, thousands of software vulnerabilities are discovered and reported to the public. Unpatched known vulnerabilities are a significant security risk. It is imperative that software vendors quickly provide patches once vulnerabilities are known and users quickly install those patches as soon as they are available. However, most vulnerabilities are never actually exploited. Since writing, testing, and installing software patches can involve considerable resources, it would be desirable to prioritize the remediation of vulnerabilities that are likely to be exploited. Several published research studies have reported moderate success in applying machine learning techniques to the task of predicting whether a vulnerability will be exploited. These approaches typically use features derived from vulnerability databases (such as the summary text describing the vulnerability) or social media posts that mention the vulnerability by name. However, these prior studies share multiple methodological shortcomings that inflate predictive power of these approaches. We replicate key portions of the prior work, compare their approaches, and show how selection of training and test data critically affect the estimated performance of predictive models. The results of this study point to important methodological considerations that should be taken into account so that results reflect real-world utility.
Microdata is collected by companies in order to enhance their quality of service as well as the accuracy of their recommendation systems. These data often become publicly available after they have been sanitized. Recent reidentification attacks on publicly available, sanitized datasets illustrate the privacy risks involved in microdata collections. Currently, users have to trust the provider that their data will be safe in case data is published or if a privacy breach occurs. In this work, we empower users by developing a novel, user-centric tool for privacy measurement and a new lightweight privacy metric. The goal of our tool is to estimate users' privacy level prior to sharing their data with a provider. Hence, users can consciously decide whether to contribute their data. Our tool estimates an individuals' privacy level based on published popularity statistics regarding the items in the provider's database, and the users' microdata. In this work, we describe the architecture of our tool as well as a novel privacy metric, which is necessary for our setting where we do not have access to the provider's database. Our tool is user friendly, relying on smart visual results that raise privacy awareness. We evaluate our tool using three real world datasets, collected from major providers. We demonstrate strong correlations between the average anonymity set per user and the privacy score obtained by our metric. Our results illustrate that our tool which uses minimal information from the provider, estimates users' privacy levels comparably well, as if it had access to the actual database.
Attack graphs provide compact representations of the attack paths an attacker can follow to compromise network resources from the analysis of network vulnerabilities and topology. These representations are a powerful tool for security risk assessment. Bayesian inference on attack graphs enables the estimation of the risk of compromise to the system's components given their vulnerabilities and interconnections and accounts for multi-step attacks spreading through the system. While static analysis considers the risk posture at rest, dynamic analysis also accounts for evidence of compromise, for example, from Security Information and Event Management software or forensic investigation. However, in this context, exact Bayesian inference techniques do not scale well. In this article, we show how Loopy Belief Propagation—an approximate inference technique—can be applied to attack graphs and that it scales linearly in the number of nodes for both static and dynamic analysis, making such analyses viable for larger networks. We experiment with different topologies and network clustering on synthetic Bayesian attack graphs with thousands of nodes to show that the algorithm's accuracy is acceptable and that it converges to a stable solution. We compare sequential and parallel versions of Loopy Belief Propagation with exact inference techniques for both static and dynamic analysis, showing the advantages and gains of approximate inference techniques when scaling to larger attack graphs.
Industrial control systems are cyber-physical systems that are used to operate critical infrastructures such as smart grids, traffic systems, industrial facilities, and water distribution networks. The digitalization of these systems increases their efficiency and decreases their cost of operation, but also makes them more vulnerable to cyber-attacks. In order to protect industrial control systems from cyber-attacks, the installation of multiple layers of security measures is necessary. In this paper, we study how to allocate a large number of security measures under a limited budget, such as to minimize the total risk of cyber-attacks. The security measure allocation problem formulated in this way is a combinatorial optimization problem subject to a knapsack (budget) constraint. The formulated problem is NP-hard, therefore we propose a method to exploit submodularity of the objective function so that polynomial time algorithms can be applied to obtain solutions with guaranteed approximation bounds. The problem formulation requires a preprocessing step in which attack scenarios are selected, and impacts and likelihoods of these scenarios are estimated. We discuss how the proposed method can be applied in practice.
This tutorial provides a thorough review of the main research directions in the field of identity management and identity related security threats in Online Social Networks (OSNs). The continuous increase in the numbers and sophistication levels of fake accounts constitutes a big threat to the privacy and to the security of honest OSN users. Uninformed OSN users could be easily fooled into accepting friendship links with fake accounts, giving them by that access to personal information they intend to exclusively share with their real friends. Moreover, these fake accounts subvert the security of the system by spreading malware, connecting with honest users for nefarious goals such as sexual harassment or child abuse, and make the social computing environment mostly untrustworthy. The tutorial introduces the main available research results available in this area, and presents our work on collaborative identity validation techniques to estimate OSN profiles trustworthiness.
Once organizations have the security incident and breaches, they have to pay tremendous costs. Although visible cost, such as the incident response cost, customer follow-up care, and legal cost are predictable and calculable, it is tough to evaluate and estimate the invisible damage, such as losing customer loyalty, reputation impact, and the damage of branding. This paper proposes a new method, called "Event Study Methodology with Twitter Sentimental Analysis" to evaluate the invisible cost. This method helps to assess the impact of the security breach and the impact on corporate valuation.
While we have long had principles describing how access control enforcement should be implemented, such as the reference monitor concept, imprecision in access control mechanisms and access control policies leads to risks that may enable exploitation. In practice, least privilege access control policies often allow information flows that may enable exploits. In addition, the implementation of access control mechanisms often tries to balance security with ease of use implicitly (e.g., with respect to determining where to place authorization hooks) and approaches to tighten access control, such as accounting for program context, are ad hoc. In this paper, we define four types of risks in access control enforcement and explore possible approaches and challenges in tracking those types of risks. In principle, we advocate runtime tracking to produce risk estimates for each of these types of risk. To better understand the potential of risk estimation for authorization, we propose risk estimate functions for each of the four types of risk, finding that benign program deployments accumulate risks in each of the four areas for ten Android programs examined. As a result, we find that tracking of relative risk may be useful for guiding changes to security choices, such as authorized unsafe operations or placement of authorization checks, when risk differs from that expected.
Users in social network are confronted with the risk of privacy leakage while sharing information with friends whose privacy protection awareness is poor. This paper proposes a security risk estimation framework of social network privacy, aiming at quantifying privacy leakage probability when information is spread to the friends of target users' friends. The privacy leakage probability in information spreading paths comprises Individual Privacy Leakage Probability (IPLP) and Relationship Privacy Leakage Probability (RPLP). IPLP is calculated based on individuals' privacy protection awareness and the trust of protecting others' privacy, while RPLP is derived from relationship strength estimation. Experiments show that the security risk estimation framework can assist users to find vulnerable friends by calculating the average and the maximum privacy leakage probability in all information spreading paths of target user in social network. Besides, three unfriending strategies are applied to decrease risk of privacy leakage and unfriending the maximum degree friend is optimal.
This article presents PrOLoc, a localization system that combines partially homomorphic encryption with a new way of structuring the localization problem to enable emcient and accurate computation of a target's location while preserving the privacy of the observers.