Biblio
The increasing publication of large amounts of data, theoretically anonymous, can lead to a number of attacks on the privacy of people. The publication of sensitive data without exposing the data owners is generally not part of the software developers concerns. The regulations for the data privacy-preserving create an appropriate scenario to focus on privacy from the perspective of the use or data exploration that takes place in an organization. The increasing number of sanctions for privacy violations motivates the systematic comparison of three known machine learning algorithms in order to measure the usefulness of the data privacy preserving. The scope of the evaluation is extended by comparing them with a known privacy preservation metric. Different parameter scenarios and privacy levels are used. The use of publicly available implementations, the presentation of the methodology, explanation of the experiments and the analysis allow providing a framework of work on the problem of the preservation of privacy. Problems are shown in the measurement of the usefulness of the data and its relationship with the privacy preserving. The findings motivate the need to create optimized metrics on the privacy preferences of the owners of the data since the risks of predicting sensitive attributes by means of machine learning techniques are not usually eliminated. In addition, it is shown that there may be a hundred percent, but it cannot be measured. As well as ensuring adequate performance of machine learning models that are of interest to the organization that data publisher.
As trust becomes increasingly important in the software domain. Due to its complex composite concept, people face great challenges, especially in today's dynamic and constantly changing internet technology. In addition, measuring the software trustworthiness correctly and effectively plays a significant role in gaining users trust in choosing different software. In the context of security, trust is previously measured based on the vulnerability time occurrence to predict the total number of vulnerabilities or their future occurrence time. In this study, we proposed a new unified index called "loss speed index" that integrates the most important variables of software security such as vulnerability occurrence time, number and severity loss, which are used to evaluate the overall software trust measurement. Based on this new definition, a new model called software trustworthy security growth model (STSGM) has been proposed. This paper also aims at filling the gap by addressing the severity of vulnerabilities and proposed a vulnerability severity prediction model, the results are further evaluated by STSGM to estimate the future loss speed index. Our work has several features such as: (1) It is used to predict the vulnerability severity/type in future, (2) Unlike traditional evaluation methods like expert scoring, our model uses historical data to predict the future loss speed of software, (3) The loss metric value is used to evaluate the risk associated with different software, which has a direct impact on software trustworthiness. Experiments performed on real software vulnerability datasets and its results are analyzed to check the correctness and effectiveness of the proposed model.
Smart Spaces are composed of heterogeneous sensors and devices that collect and share information. This information may contain personal information of the users. Thus, securing the data and preserving the privacy are of paramount importance. In this paper, we propose techniques for information security and privacy protection for Smart Spaces based on the Smart-M3 platform. We propose a) a security framework, and b) a context-aware role-based access control scheme. We model our access control scheme using ontological techniques and Web Ontology Language (OWL), and implement it via CLIPS rules. To evaluate the efficiency of our access control scheme, we measure the time it takes to check the access rights of the access requests. The results demonstrate that the highest response time is approximately 0.2 seconds in a set of 100000 triples. We conclude that the proposed access control scheme produces low overhead and is therefore, an efficient approach for Smart Spaces.
We introduce a Markov-model-based framework for Moving Target Defense (MTD) analysis. The framework allows modeling of a broad range of MTD strategies, provides general theorems about how the probability of a successful adversary defeating an MTD strategy is related to the amount of time/cost spent by the adversary, and shows how a multilevel composition of MTD strategies can be analyzed by a straightforward combination of the analysis for each one of these strategies. Within the proposed framework we define the concept of security capacity which measures the strength or effectiveness of an MTD strategy: the security capacity depends on MTD specific parameters and more general system parameters. We apply our framework to two concrete MTD strategies.
We study the value of data privacy in a game-theoretic model of trading private data, where a data collector purchases private data from strategic data subjects (individuals) through an incentive mechanism. The private data of each individual represents her knowledge about an underlying state, which is the information that the data collector desires to learn. Different from most of the existing work on privacy-aware surveys, our model does not assume the data collector to be trustworthy. Then, an individual takes full control of its own data privacy and reports only a privacy-preserving version of her data. In this paper, the value of ε units of privacy is measured by the minimum payment of all nonnegative payment mechanisms, under which an individual's best response at a Nash equilibrium is to report the data with a privacy level of ε. The higher ε is, the less private the reported data is. We derive lower and upper bounds on the value of privacy which are asymptotically tight as the number of data subjects becomes large. Specifically, the lower bound assures that it is impossible to use less amount of payment to buy ε units of privacy, and the upper bound is given by an achievable payment mechanism that we designed. Based on these fundamental limits, we further derive lower and upper bounds on the minimum total payment for the data collector to achieve a given learning accuracy target, and show that the total payment of the designed mechanism is at most one individual's payment away from the minimum.
Maintaining and updating signature databases is a tedious task that normally requires a large amount of user effort. The problem becomes harder when features can be distorted by observation noise, which we call volatility. To address this issue, we propose algorithms and models to automatically generate signatures in the presence of noise, with a focus on stack fingerprinting, which is a research area that aims to discover the operating system (OS) of remote hosts using TCP/IP packets. Armed with this framework, we construct a database with 420 network stacks, label the signatures, develop a robust classifier for this database, and fingerprint 66M visible webservers on the Internet.
In this paper, we introduce Entropy/IP: a system that discovers Internet address structure based on analyses of a subset of IPv6 addresses known to be active, i.e., training data, gleaned by readily available passive and active means. The system is completely automated and employs a combination of information-theoretic and machine learning techniques to probabilistically model IPv6 addresses. We present results showing that our system is effective in exposing structural characteristics of portions of the active IPv6 Internet address space, populated by clients, services, and routers. In addition to visualizing the address structure for exploration, the system uses its models to generate candidate addresses for scanning. For each of 15 evaluated datasets, we train on 1K addresses and generate 1M candidates for scanning. We achieve some success in 14 datasets, finding up to 40% of the generated addresses to be active. In 11 of these datasets, we find active network identifiers (e.g., /64 prefixes or "subnets") not seen in training. Thus, we provide the first evidence that it is practical to discover subnets and hosts by scanning probabilistically selected areas of the IPv6 address space not known to contain active hosts a priori.
We introduce a model for differentially private analysis of weighted graphs in which the graph topology (υ,ε) is assumed to be public and the private information consists only of the edge weights ω : ε → R+. This can express hiding congestion patterns in a known system of roads. Differential privacy requires that the output of an algorithm provides little advantage, measured by privacy parameters ε and δ, for distinguishing between neighboring inputs, which are thought of as inputs that differ on the contribution of one individual. In our model, two weight functions w,w' are considered to be neighboring if they have l1 distance at most one. We study the problems of privately releasing a short path between a pair of vertices and of privately releasing approximate distances between all pairs of vertices. We are concerned with the approximation error, the difference between the length of the released path or released distance and the length of the shortest path or actual distance. For the problem of privately releasing a short path between a pair of vertices, we prove a lower bound of Ω(textbarυtextbar) on the additive approximation error for fixed privacy parameters ε,δ. We provide a differentially private algorithm that matches this error bound up to a logarithmic factor and releases paths between all pairs of vertices, not just a single pair. The approximation error achieved by our algorithm can be bounded by the number of edges on the shortest path, so we achieve better accuracy than the worst-case bound for pairs of vertices that are connected by a low-weight path consisting of o(textbarυtextbar) vertices. For the problem of privately releasing all-pairs distances, we show that for trees we can release all-pairs distances with approximation error \$O(log2.5textbarυtextbar) for fixed privacy parameters. For arbitrary bounded-weight graphs with edge weights in [0,M] we can brelease all distances with approximation error Õ(√textgreater(textbarυtextbarM).
Randomness is a vital resource for modern-day information processing, especially for cryptography. A wide range of applications critically rely on abundant, high-quality random numbers generated securely. Here, we show how to expand a random seed at an exponential rate without trusting the underlying quantum devices. Our approach is secure against the most general adversaries, and has the following new features: cryptographic level of security, tolerating a constant level of imprecision in devices, requiring only unit size quantum memory (for each device component) in an honest implementation, and allowing a large natural class of constructions for the protocol. In conjunction with a recent work by Chung et al. [2014], it also leads to robust unbounded expansion using just 2 multipart devices. When adapted for distributing cryptographic keys, our method achieves, for the first time, exponential expansion combined with cryptographic security and noise tolerance. The proof proceeds by showing that the Rényi divergence of the outputs of the protocol (for a specific bounding operator) decreases linearly as the protocol iterates. At the heart of the proof are a new uncertainty principle on quantum measurements and a method for simulating trusted measurements with untrusted devices.
Tracking and maintaining satisfactory QoE for video streaming services is becoming a greater challenge for mobile network operators than ever before. Downloading and watching video content on mobile devices is currently a growing trend among users, that is causing a demand for higher bandwidth and better provisioning throughout the network infrastructure. At the same time, popular demand for privacy has led many online streaming services to adopt end-to-end encryption, leaving providers with only a handful of indicators for identifying QoE issues. In order to address these challenges, we propose a novel methodology for detecting video streaming QoE issues from encrypted traffic. We develop predictive models for detecting different levels of QoE degradation that is caused by three key influence factors, i.e. stalling, the average video quality and the quality variations. The models are then evaluated on the production network of a large scale mobile operator, where we show that despite encryption our methodology is able to accurately detect QoE problems with 72\textbackslash%-92\textbackslash% accuracy, while even higher performance is achieved when dealing with cleartext traffic
Given a set D of tuples defined on a domain Omega, we study differentially private algorithms for constructing a histogram over Omega to approximate the tuple distribution in D. Existing solutions for the problem mostly adopt a hierarchical decomposition approach, which recursively splits Omega into sub-domains and computes a noisy tuple count for each sub-domain, until all noisy counts are below a certain threshold. This approach, however, requires that we (i) impose a limit h on the recursion depth in the splitting of Omega and (ii) set the noise in each count to be proportional to h. The choice of h is a serious dilemma: a small h makes the resulting histogram too coarse-grained, while a large h leads to excessive noise in the tuple counts used in deciding whether sub-domains should be split. Furthermore, h cannot be directly tuned based on D; otherwise, the choice of h itself reveals private information and violates differential privacy. To remedy the deficiency of existing solutions, we present PrivTree, a histogram construction algorithm that adopts hierarchical decomposition but completely eliminates the dependency on a pre-defined h. The core of PrivTree is a novel mechanism that (i) exploits a new analysis on the Laplace distribution and (ii) enables us to use only a constant amount of noise in deciding whether a sub-domain should be split, without worrying about the recursion depth of splitting. We demonstrate the application of PrivTree in modelling spatial data, and show that it can be extended to handle sequence data (where the decision in sub-domain splitting is not based on tuple counts but a more sophisticated measure). Our experiments on a variety of real datasets show that PrivTree considerably outperforms the states of the art in terms of data utility.
Anonymous messaging applications have recently gained popularity as a means for sharing opinions without fear of judgment or repercussion. These messages propagate anonymously over a network, typically de ned by social connections or physical proximity. However, recent advances in rumor source detection show that the source of such an anonymous message can be inferred by certain statistical inference attacks. Adaptive di usion was recently proposed as a solution that achieves optimal source obfuscation over regular trees. However, in real social networks, the degrees difer from node to node, and adaptive di usion can be signicantly sub-optimal. This gap increases as the degrees become more irregular.
In order to quantify this gap, we model the underlying network as coming from standard branching processes with i.i.d. degree distributions. Building upon the analysis techniques from branching processes, we give an analytical characterization of the dependence of the probability of detection achieved by adaptive di usion on the degree distribution. Further, this analysis provides a key insight: passing a rumor to a friend who has many friends makes the source more ambiguous. This leads to a new family of protocols that we call Preferential Attachment Adaptive Di usion (PAAD). When messages are propagated according to PAAD, we give both the MAP estimator for nding the source and also an analysis of the probability of detection achieved by this adversary. The analytical results are not directly comparable, since the adversary's observed information has a di erent distribution under adaptive di usion than under PAAD. Instead, we present results from numerical experiments that suggest that PAAD achieves a lower probability of detection, at the cost of increased communication for coordination.