Biblio
Location entropy (LE) is a popular metric for measuring the popularity of various locations (e.g., points-of-interest). Unlike other metrics computed from only the number of (unique) visits to a location, namely frequency, LE also captures the diversity of the users' visits, and is thus more accurate than other metrics. Current solutions for computing LE require full access to the past visits of users to locations, which poses privacy threats. This paper discusses, for the first time, the problem of perturbing location entropy for a set of locations according to differential privacy. The problem is challenging because removing a single user from the dataset will impact multiple records of the database; i.e., all the visits made by that user to various locations. Towards this end, we first derive non-trivial, tight bounds for both local and global sensitivity of LE, and show that to satisfy ε-differential privacy, a large amount of noise must be introduced, rendering the published results useless. Hence, we propose a thresholding technique to limit the number of users' visits, which significantly reduces the perturbation error but introduces an approximation error. To achieve better utility, we extend the technique by adopting two weaker notions of privacy: smooth sensitivity (slightly weaker) and crowd-blending (strictly weaker). Extensive experiments on synthetic and real-world datasets show that our proposed techniques preserve original data distribution without compromising location privacy.
Differential privacy is a precise mathematical constraint meant to ensure privacy of individual pieces of information in a database even while queries are being answered about the aggregate. Intuitively, one must come to terms with what differential privacy does and does not guarantee. For example, the definition prevents a strong adversary who knows all but one entry in the database from further inferring about the last one. This strong adversary assumption can be overlooked, resulting in misinterpretation of the privacy guarantee of differential privacy. Herein we give an equivalent definition of privacy using mutual information that makes plain some of the subtleties of differential privacy. The mutual-information differential privacy is in fact sandwiched between ε-differential privacy and (ε,δ)-differential privacy in terms of its strength. In contrast to previous works using unconditional mutual information, differential privacy is fundamentally related to conditional mutual information, accompanied by a maximization over the database distribution. The conceptual advantage of using mutual information, aside from yielding a simpler and more intuitive definition of differential privacy, is that its properties are well understood. Several properties of differential privacy are easily verified for the mutual information alternative, such as composition theorems.
Modern Industrial Control Systems (ICS) rely on enterprise to plant floor connectivity. Where the size, diversity, and therefore complexity of ICS increase, operational requirements, goals, and challenges defined by users across various sub-systems follow. Recent trends in Information Technology (IT) and Operational Technology (OT) convergence may cause operators to lose a comprehensive understanding of end-to-end data flow requirements. This presents a risk to system security and resilience. Sensors were once solely applied for operational process use, but now act as inputs supporting a diverse set of organisational requirements. If these are not fully understood, incomplete risk assessment, and inappropriate implementation of security controls could occur. In search of a solution, operators may turn to standards and guidelines. This paper reviews popular standards and guidelines, prior to the presentation of a case study and conceptual tool, highlighting the importance of data flows, critical data processing points, and system-to-user relationships. The proposed approach forms a basis for risk assessment and security control implementation, aiding the evolution of ICS security and resilience.
Nowadays, both the amount of cyberattacks and their sophistication have considerably increased, and their prevention concerns many organizations. Cooperation by means of information sharing is a promising strategy to address this problem, but unfortunately it poses many challenges. Indeed, looking for a win-win environment is not straightforward and organizations are not properly motivated to share information. This work presents a model to analyse the benefits and drawbacks of information sharing among organizations that present a certain level of dependency. The proposed model applies functional dependency network analysis to emulate attacks propagation and game theory for information sharing management. We present a simulation framework implementing the model that allows for testing different sharing strategies under several network and attack settings. Experiments using simulated environments show how the proposed model provides insights on which conditions and scenarios are beneficial for information sharing.
After decades of cyber warfare, it is well-known that the static and predictable behavior of cyber configuration provides a great advantage to adversaries to plan and launch their attack successfully. At the same time, as cyber attacks are getting highly stealthy and more sophisticated, their detection and mitigation become much harder and expensive. We developed a new foundation for moving target defense (MTD) based on cyber mutation, as a new concept in cybersecurity to reverse this asymmetry in cyber warfare by embedding agility into cyber systems. Cyber mutation enables cyber systems to automatically change its configuration parameters in unpredictable, safe and adaptive manner in order to proactively achieve one or more of the following MTD goals: (1) deceiving attackers from reaching their goals, (2) disrupting their plans via changing adversarial behaviors, and (3) deterring adversaries by prohibitively increasing the attack effort and cost. In this talk, we will present the formal foundations, metrics and framework for developing effective cyber mutation techniques. The talk will also review several examples of developed techniques including Random Host Mutation, Random Rout Mutation, fingerprinting mutation, and mutable virtual networks. The talk will also address the evaluation and lessons learned for advancing the future research in this area.
We study the value of data privacy in a game-theoretic model of trading private data, where a data collector purchases private data from strategic data subjects (individuals) through an incentive mechanism. The private data of each individual represents her knowledge about an underlying state, which is the information that the data collector desires to learn. Different from most of the existing work on privacy-aware surveys, our model does not assume the data collector to be trustworthy. Then, an individual takes full control of its own data privacy and reports only a privacy-preserving version of her data. In this paper, the value of ε units of privacy is measured by the minimum payment of all nonnegative payment mechanisms, under which an individual's best response at a Nash equilibrium is to report the data with a privacy level of ε. The higher ε is, the less private the reported data is. We derive lower and upper bounds on the value of privacy which are asymptotically tight as the number of data subjects becomes large. Specifically, the lower bound assures that it is impossible to use less amount of payment to buy ε units of privacy, and the upper bound is given by an achievable payment mechanism that we designed. Based on these fundamental limits, we further derive lower and upper bounds on the minimum total payment for the data collector to achieve a given learning accuracy target, and show that the total payment of the designed mechanism is at most one individual's payment away from the minimum.
We introduce a model for differentially private analysis of weighted graphs in which the graph topology (υ,ε) is assumed to be public and the private information consists only of the edge weights ω : ε → R+. This can express hiding congestion patterns in a known system of roads. Differential privacy requires that the output of an algorithm provides little advantage, measured by privacy parameters ε and δ, for distinguishing between neighboring inputs, which are thought of as inputs that differ on the contribution of one individual. In our model, two weight functions w,w' are considered to be neighboring if they have l1 distance at most one. We study the problems of privately releasing a short path between a pair of vertices and of privately releasing approximate distances between all pairs of vertices. We are concerned with the approximation error, the difference between the length of the released path or released distance and the length of the shortest path or actual distance. For the problem of privately releasing a short path between a pair of vertices, we prove a lower bound of Ω(textbarυtextbar) on the additive approximation error for fixed privacy parameters ε,δ. We provide a differentially private algorithm that matches this error bound up to a logarithmic factor and releases paths between all pairs of vertices, not just a single pair. The approximation error achieved by our algorithm can be bounded by the number of edges on the shortest path, so we achieve better accuracy than the worst-case bound for pairs of vertices that are connected by a low-weight path consisting of o(textbarυtextbar) vertices. For the problem of privately releasing all-pairs distances, we show that for trees we can release all-pairs distances with approximation error \$O(log2.5textbarυtextbar) for fixed privacy parameters. For arbitrary bounded-weight graphs with edge weights in [0,M] we can brelease all distances with approximation error Õ(√textgreater(textbarυtextbarM).
The rising prevalence of algorithmic interfaces, such as curated feeds in online news, raises new questions for designers, scholars, and critics of media. This work focuses on how transparent design of algorithmic interfaces can promote awareness and foster trust. A two-stage process of how transparency affects trust was hypothesized drawing on theories of information processing and procedural justice. In an online field experiment, three levels of system transparency were tested in the high-stakes context of peer assessment. Individuals whose expectations were violated (by receiving a lower grade than expected) trusted the system less, unless the grading algorithm was made more transparent through explanation. However, providing too much information eroded this trust. Attitudes of individuals whose expectations were met did not vary with transparency. Results are discussed in terms of a dual process model of attitude change and the depth of justification of perceived inconsistency. Designing for trust requires balanced interface transparency - not too little and not too much.
Recently, PIN-TRUST, a method to predict future trust relationships between users is proposed. PIN-TRUST out-performs existing trust prediction methods by exploiting all types of interactions between users and the reciprocation of ones. In this paper, we validate whether its consideration on the reciprocation of interactions is really effective in trust prediction. Furthermore, we consider a new concept, the "uncertainty" of untrustworthy users that is devised to reflect the difficulty on modeling the activities of untrustworthy users in PIN-TRUST. Then, we also validate the effectiveness this uncertainty concepts. Through the validation, we reveal that the consideration of the reciprocation of interactions is effective for trust prediction with PIN-TRUST, and it is necessary to regard the uncertainty of untrustworthy users same as that of other users.
In this paper, we analyze the performance and cost trade-off from selecting two representations of nodes when implementing the Aho-Corasick algorithm. This algorithm can be used for pattern matching in network-based intrusion detection systems such as Snort. Our analysis uses the Snort 2.9.7 rules set, which contains almost 26k patterns. Our methodology consists of code profiling and analysis, followed by the selection of a parameter to maximize a metric that combines clock cycles count and memory usage. The parameter determines which of two types of nodes is selected for each trie node. We show that it is possible to select the parameter to optimize the metric, which results in an improvement by up to 12× compared with the single node-type case.
While cloud computing has become an attractive platform for supporting data intensive applications, a major obstacle to the adoption of cloud computing in sectors such as health and defense is the privacy risk associated with releasing datasets to third-parties in the cloud for analysis. A widely-adopted technique for data privacy preservation is to anonymize data via local recoding. However, most existing local-recoding techniques are either serial or distributed without directly optimizing scalability, thus rendering them unsuitable for big data applications. In this paper, we propose a highly scalable approach to local-recoding anonymization in cloud computing, based on Locality Sensitive Hashing (LSH). Specifically, a novel semantic distance metric is presented for use with LSH to measure the similarity between two data records. Then, LSH with the MinHash function family can be employed to divide datasets into multiple partitions for use with MapReduce to parallelize computation while preserving similarity. By using our efficient LSH-based scheme, we can anonymize each partition through the use of a recursive agglomerative \$k\$-member clustering algorithm. Extensive experiments on real-life datasets show that our approach significantly improves the scalability and time-efficiency of local-recoding anonymization by orders of magnitude over existing approaches.
Most cyber network attacks begin with an adversary gaining a foothold within the network and proceed with lateral movement until a desired goal is achieved. The mechanism by which lateral movement occurs varies but the basic signature of hopping between hosts by exploiting vulnerabilities is the same. Because of the nature of the vulnerabilities typically exploited, lateral movement is very difficult to detect and defend against. In this paper we define a dynamic reachability graph model of the network to discover possible paths that an adversary could take using different vulnerabilities, and how those paths evolve over time. We use this reachability graph to develop dynamic machine-level and network-level impact scores. Lateral movement mitigation strategies which make use of our impact scores are also discussed, and we detail an example using a freely available data set.
Physical Unclonable Functions (PUFs) measure manufacturing variations inside integrated circuits to derive internal secrets during run-time and avoid to store secrets permanently in non-volatile memory. PUF responses are noisy such that they require error correction to generate reliable cryptographic keys. To date, when needed one single key is reproduced in the field and always used, regardless of its reliability. In this work, we compute online reliability information for a reproduced key and perform multiple PUF readout and error correction steps in case of an unreliable result. This permits to choose the most reliable key among multiple derived key candidates with different corrected error patterns. We achieve the same average key error probability from less PUF response bits with this approach. Our proof of concept design for a popular reference scenario uses Differential Sequence Coding (DSC) and a Viterbi decoder with reliability output information. It requires 39% less PUF response bits and 16% less helper data bits than the regular approach without the option for multiple readouts.