Biblio
We present OpenFace, our new open-source face recognition system that approaches state-of-the-art accuracy. Integrating OpenFace with inter-frame tracking, we build RTFace, a mechanism for denaturing video streams that selectively blurs faces according to specified policies at full frame rates. This enables privacy management for live video analytics while providing a secure approach for handling retrospective policy exceptions. Finally, we present a scalable, privacy-aware architecture for large camera networks using RTFace.
In machine learning, feature engineering has been a pivotal stage in building a high-quality predictor. Particularly, this work explores the multiple Kernel Discriminant Component Analysis (mKDCA) feature-map and its variants. However, seeking the right subset of kernels for mKDCA feature-map can be challenging. Therefore, we consider the problem of kernel selection, and propose an algorithm based on Differential Mutual Information (DMI) and incremental forward search. DMI serves as an effective metric for selecting kernels, as is theoretically supported by mutual information and Fisher's discriminant analysis. On the other hand, incremental forward search plays a role in removing redundancy among kernels. Finally, we illustrate the potential of the method via an application in privacy-aware classification, and show on three mobile-sensing datasets that selecting an effective set of kernels for mKDCA feature-maps can enhance the utility classification performance, while successfully preserve the data privacy. Specifically, the results show that the proposed DMI forward search method can perform better than the state-of-the-art, and, with much smaller computational cost, can perform as well as the optimal, yet computationally expensive, exhaustive search.
In this paper, we focus on developing a novel mechanism to preserve differential privacy in deep neural networks, such that: (1) The privacy budget consumption is totally independent of the number of training steps; (2) It has the ability to adaptively inject noise into features based on the contribution of each to the output; and (3) It could be applied in a variety of different deep neural networks. To achieve this, we figure out a way to perturb affine transformations of neurons, and loss functions used in deep neural networks. In addition, our mechanism intentionally adds "more noise" into features which are "less relevant" to the model output, and vice-versa. Our theoretical analysis further derives the sensitivities and error bounds of our mechanism. Rigorous experiments conducted on MNIST and CIFAR-10 datasets show that our mechanism is highly effective and outperforms existing solutions.
We propose a privacy-preserving framework for learning visual classifiers by leveraging distributed private image data. This framework is designed to aggregate multiple classifiers updated locally using private data and to ensure that no private information about the data is exposed during and after its learning procedure. We utilize a homomorphic cryptosystem that can aggregate the local classifiers while they are encrypted and thus kept secret. To overcome the high computational cost of homomorphic encryption of high-dimensional classifiers, we (1) impose sparsity constraints on local classifier updates and (2) propose a novel efficient encryption scheme named doublypermuted homomorphic encryption (DPHE) which is tailored to sparse high-dimensional data. DPHE (i) decomposes sparse data into its constituent non-zero values and their corresponding support indices, (ii) applies homomorphic encryption only to the non-zero values, and (iii) employs double permutations on the support indices to make them secret. Our experimental evaluation on several public datasets shows that the proposed approach achieves comparable performance against state-of-the-art visual recognition methods while preserving privacy and significantly outperforms other privacy-preserving methods.
In this paper, we propose a variant of searchable public-key encryption named hidden-token searchable public-key encryption with two new security properties: token anonymity and one-token-per-trapdoor. With the former security notion, the client can obtain the search token from the data owner without revealing any information about the underlying keyword. Meanwhile, the client cannot derive more than one token from one trapdoor generated by the data owner according to the latter security notion. Furthermore, we present a concrete hiddentoken searchable public-key encryption scheme together with the security proofs in the random oracle model.
In assessing privacy on online social networks, it is important to investigate their vulnerability to reconnaissance strategies, in which attackers lure targets into being their friends by exploiting the social graph in order to extract victims' sensitive information. As the network topology is only partially revealed after each successful friend request, attackers need to employ an adaptive strategy. Existing work only considered a simple strategy in which attackers sequentially acquire one friend at a time, which causes tremendous delay in waiting for responses before sending the next request, and which lack the ability to retry failed requests after the network has changed. In contrast, we investigate an adaptive and parallel strategy, of which attackers can simultaneously send multiple friend requests in batch and recover from failed requests by retrying after topology changes, thereby significantly reducing the time to reach the targets and greatly improving robustness. We cast this approach as an optimization problem, Max-Crawling, and show it inapproximable within (1 - 1/e + $ε$). We first design our core algorithm PM-AReST which has an approximation ratio of (1 - e-(1-1/e)) using adaptive monotonic submodular properties. We next tighten our algorithm to provide a nearoptimal solution, i.e. having a ratio of (1 - 1/e), via a two-stage stochastic programming approach. We further establish the gap bound of (1 - e-(1-1/e)2) between batch strategies versus the optimal sequential one. We experimentally validate our theoretical results, finding that our algorithm performs nearoptimally in practice and that this is robust under a variety of problem settings.
Microdata is collected by companies in order to enhance their quality of service as well as the accuracy of their recommendation systems. These data often become publicly available after they have been sanitized. Recent reidentification attacks on publicly available, sanitized datasets illustrate the privacy risks involved in microdata collections. Currently, users have to trust the provider that their data will be safe in case data is published or if a privacy breach occurs. In this work, we empower users by developing a novel, user-centric tool for privacy measurement and a new lightweight privacy metric. The goal of our tool is to estimate users' privacy level prior to sharing their data with a provider. Hence, users can consciously decide whether to contribute their data. Our tool estimates an individuals' privacy level based on published popularity statistics regarding the items in the provider's database, and the users' microdata. In this work, we describe the architecture of our tool as well as a novel privacy metric, which is necessary for our setting where we do not have access to the provider's database. Our tool is user friendly, relying on smart visual results that raise privacy awareness. We evaluate our tool using three real world datasets, collected from major providers. We demonstrate strong correlations between the average anonymity set per user and the privacy score obtained by our metric. Our results illustrate that our tool which uses minimal information from the provider, estimates users' privacy levels comparably well, as if it had access to the actual database.
The Trusted Platform Module (TPM) is an international standard for a security chip that can be used for the management of cryptographic keys and for remote attestation. The specification of the most recent TPM 2.0 interfaces for direct anonymous attestation unfortunately has a number of severe shortcomings. First of all, they do not allow for security proofs (indeed, the published proofs are incorrect). Second, they provide a Diffie-Hellman oracle w.r.t. the secret key of the TPM, weakening the security and preventing forward anonymity of attestations. Fixes to these problems have been proposed, but they create new issues: they enable a fraudulent TPM to encode information into an attestation signature, which could be used to break anonymity or to leak the secret key. Furthermore, all proposed ways to remove the Diffie-Hellman oracle either strongly limit the functionality of the TPM or would require significant changes to the TPM 2.0 interfaces. In this paper we provide a better specification of the TPM 2.0 interfaces that addresses these problems and requires only minimal changes to the current TPM 2.0 commands. We then show how to use the revised interfaces to build q-SDH-and LRSW-based anonymous attestation schemes, and prove their security. We finally discuss how to obtain other schemes addressing different use cases such as key-binding for U-Prove and e-cash.
The data accessibility anytime and anywhere is nowadays the key feature for information technology enabled by the ubiquitous network system for huge applications. However, security and privacy are perceived as primary obstacles to its wide adoption when it is applied to the end user application. When sharing sensitive information, personal s' data protection is the paramount requirement for the security and privacy to ensure the trustworthiness of the service provider. To this end, this paper proposes communication security protocol to achieve data protection when a user is sending his sensitive data to the network through gateway. We design a cipher content and key exchange computation process. Finally, the performance analysis of the proposed scheme ensure the honesty of the gateway service provider, since the user has the ability to control who has access to his data by issuing a cryptographic access credential to data users.
In this paper, we focus on the security issues and challenges in smart grid. Smart grid security features must address not only the expected deliberate attacks, but also inadvertent compromises of the information infrastructure due to user errors, equipment failures, and natural disasters. An important component of smart grid is the advanced metering infrastructure which is critical to support two-way communication of real time information for better electricity generation, distribution and consumption. These reasons makes security a prominent factor of importance to AMI. In recent times, attacks on smart grid have been modelled using attack tree. Attack tree has been extensively used as an efficient and effective tool to model security threats and vulnerabilities in systems where the ultimate goal of an attacker can be divided into a set of multiple concrete or atomic sub-goals. The sub-goals are related to each other as either AND-siblings or OR-siblings, which essentially depicts whether some or all of the sub-goals must be attained for the attacker to reach the goal. On the other hand, as a security professional one needs to find out the most effective way to address the security issues in the system under consideration. It is imperative to assume that each attack prevention strategy incurs some cost and the utility company would always look to minimize the same. We present a cost-effective mechanism to identify minimum number of potential atomic attacks in an attack tree.
The increasing complexity and ubiquity in user connectivity, computing environments, information content, and software, mobile, and web applications transfers the responsibility of privacy management to the individuals. Hence, making it extremely difficult for users to maintain the intelligent and targeted level of privacy protection that they need and desire, while simultaneously maintaining their ability to optimally function. Thus, there is a critical need to develop intelligent, automated, and adaptable privacy management systems that can assist users in managing and protecting their sensitive data in the increasingly complex situations and environments that they find themselves in. This work is a first step in exploring the development of such a system, specifically how user personality traits and other characteristics can be used to help automate determination of user sharing preferences for a variety of user data and situations. The Big-Five personality traits of openness, conscientiousness, extroversion, agreeableness, and neuroticism are examined and used as inputs into several popular machine learning algorithms in order to assess their ability to elicit and predict user privacy preferences. Our results show that the Big-Five personality traits can be used to significantly improve the prediction of user privacy preferences in a number of contexts and situations, and so using machine learning approaches to automate the setting of user privacy preferences has the potential to greatly reduce the burden on users while simultaneously improving the accuracy of their privacy preferences and security.
Enterprises usually provide strong controls to prevent cyberattacks and inadvertent leakage of data to external entities. However, in the case where employees and data scientists have legitimate access to analyze and derive insights from the data, there are insufficient controls and employees are usually permitted access to all information about the customers of the enterprise including sensitive and private information. Though it is important to be able to identify useful patterns of one's customers for better customization and service, customers' privacy must not be sacrificed to do so. We propose an alternative — a framework that will allow privacy preserving data analytics over big data. In this paper, we present an efficient and scalable framework for Apache Spark, a cluster computing framework, that provides strong privacy guarantees for users even in the presence of an informed adversary, while still providing high utility for analysts. The framework, titled Shade, includes two mechanisms — SparkLAP, which provides Laplacian perturbation based on a user's query and SparkSAM, which uses the contents of the database itself in order to calculate the perturbation. We show that the performance of Shade is substantially better than earlier differential privacy systems without loss of accuracy, particularly when run on datasets small enough to fit in memory, and find that SparkSAM can even exceed performance of an identical nonprivate Spark query.
In the age of Big Data, we are witnessing a huge proliferation of digital data capturing our lives and our surroundings. Data privacy is a critical barrier to data analytics and privacy-preserving data disclosure becomes a key aspect to leveraging large-scale data analytics due to serious privacy risks. Traditional privacy-preserving data publishing solutions have focused on protecting individual's private information while considering all aggregate information about individuals as safe for disclosure. This paper presents a new privacy-aware data disclosure scheme that considers group privacy requirements of individuals in bipartite association graph datasets (e.g., graphs that represent associations between entities such as customers and products bought from a pharmacy store) where even aggregate information about groups of individuals may be sensitive and need protection. We propose the notion of $ε$g-Group Differential Privacy that protects sensitive information of groups of individuals at various defined group protection levels, enabling data users to obtain the level of information entitled to them. Based on the notion of group privacy, we develop a suite of differentially private mechanisms that protect group privacy in bipartite association graphs at different group privacy levels based on specialization hierarchies. We evaluate our proposed techniques through extensive experiments on three real-world association graph datasets and our results demonstrate that the proposed techniques are effective, efficient and provide the required guarantees on group privacy.
With the development of modern logistics industry railway freight enterprises as the main traditional logistics enterprises, the service mode is facing many problems. In the era of big data, for railway freight enterprises, coordinated development and sharing of information resources have become the requirements of the times, while how to protect the privacy of citizens has become one of the focus issues of the public. To prevent the disclosure or abuse of the citizens' privacy information, the citizens' privacy needs to be preserved in the process of information opening and sharing. However, most of the existing privacy preserving models cannot to be used to resist attacks with continuously growing background knowledge. This paper presents the method of applying differential privacy to protect associated data, which can be shared in railway freight service association information. First, the original service data need to slice by optimal shard length, then differential method and apriori algorithm is used to add Laplace noise in the Candidate sets. Thus the citizen's privacy information can be protected even if the attacker gets strong background knowledge. Last, sharing associated data to railway information resource partners. The steps and usefulness of the discussed privacy preservation method is illustrated by an example.
In the recent years, we have observed the development of several connected and mobile devices intended for daily use. This development has come with many risks that might not be perceived by the users. These threats are compromising when an unauthorized entity has access to private big data generated through the user objects in the Internet of Things. In the literature, many solutions have been proposed in order to protect the big data, but the security remains a challenging issue. This work is carried out with the aim to provide a solution to the access control to the big data and securing the localization of their generator objects. The proposed models are based on Attribute Based Encryption, CHORD protocol and $μ$TESLA. Through simulations, we compare our solutions to concurrent protocols and we show its efficiency in terms of relevant criteria.
Location privacy has become a significant challenge of big data. Particularly, by the advantage of big data handling tools availability, huge location data can be managed and processed easily by an adversary to obtain user private information from Location-Based Services (LBS). So far, many methods have been proposed to preserve user location privacy for these services. Among them, dummy-based methods have various advantages in terms of implementation and low computation costs. However, they suffer from the spatiotemporal correlation issue when users submit consecutive requests. To solve this problem, a practical hybrid location privacy protection scheme is presented in this paper. The proposed method filters out the correlated fake location data (dummies) before submissions. Therefore, the adversary can not identify the user's real location. Evaluations and experiments show that our proposed filtering technique significantly improves the performance of existing dummy-based methods and enables them to effectively protect the user's location privacy in the environment of big data.
The main issue with big data in cloud is the processed or used always need to be by third party. It is very important for the owners of data or clients to trust and to have the guarantee of privacy for the information stored in cloud or analyzed as big data. The privacy models studied in previous research showed that privacy infringement for big data happened because of limitation, privacy guarantee rate or dissemination of accurate data which is obtainable in the data set. In addition, there are various privacy models. In order to determine the best and the most appropriate model to be applied in the future, which also guarantees big data privacy, it is necessary to invest in research and study. In the next part, we surfed some of the privacy models in order to determine the advantages and disadvantages of each model in privacy assurance for big data in cloud. The present study also proposes combined Diff-Anonym algorithm (K-anonymity and differential models) to provide data anonymity with guarantee to keep balance between ambiguity of private data and clarity of general data.
Interconnected everyday objects, either via public or private networks, are gradually becoming reality in modern life - often referred to as the Internet of Things (IoT) or Cyber-Physical Systems (CPS). One stand-out example are those systems based on Unmanned Aerial Vehicles (UAVs). Fleets of such vehicles (drones) are prophesied to assume multiple roles from mundane to high-sensitive applications, such as prompt pizza or shopping deliveries to the home, or to deployment on battlefields for battlefield and combat missions. Drones, which we refer to as UAVs in this paper, can operate either individually (solo missions) or as part of a fleet (group missions), with and without constant connection with a base station. The base station acts as the command centre to manage the drones' activities; however, an independent, localised and effective fleet control is necessary, potentially based on swarm intelligence, for several reasons: 1) an increase in the number of drone fleets; 2) fleet size might reach tens of UAVs; 3) making time-critical decisions by such fleets in the wild; 4) potential communication congestion and latency; and 5) in some cases, working in challenging terrains that hinders or mandates limited communication with a control centre, e.g. operations spanning long period of times or military usage of fleets in enemy territory. This self-aware, mission-focused and independent fleet of drones may utilise swarm intelligence for a), air-traffic or flight control management, b) obstacle avoidance, c) self-preservation (while maintaining the mission criteria), d) autonomous collaboration with other fleets in the wild, and e) assuring the security, privacy and safety of physical (drones itself) and virtual (data, software) assets. In this paper, we investigate the challenges faced by fleet of drones and propose a potential course of action on how to overcome them.