Biblio
The smart grid is a complex cyber-physical system (CPS) that poses challenges related to scale, integration, interoperability, processes, governance, and human elements. The US National Institute of Standards and Technology (NIST) and its government, university and industry collaborators, developed an approach, called CPS Framework, to reasoning about CPS across multiple levels of concern and competency, including trustworthiness, privacy, reliability, and regulatory. The approach uses ontology and reasoning techniques to achieve a greater understanding of the interdependencies among the elements of the CPS Framework model applied to use cases. This paper demonstrates that the approach extends naturally to automated and manual decision-making for smart grids: we apply it to smart grid use cases, and illustrate how it can be used to analyze grid topologies and address concerns about the smart grid. Smart grid stakeholders, whose decision making may be assisted by this approach, include planners, designers and operators.
In recent trends, privacy preservation is the most predominant factor, on big data analytics and cloud computing. Every organization collects personal data from the users actively or passively. Publishing this data for research and other analytics without removing Personally Identifiable Information (PII) will lead to the privacy breach. Existing anonymization techniques are failing to maintain the balance between data privacy and data utility. In order to provide a trade-off between the privacy of the users and data utility, a Mondrian based k-anonymity approach is proposed. To protect the privacy of high-dimensional data Deep Neural Network (DNN) based framework is proposed. The experimental result shows that the proposed approach mitigates the information loss of the data without compromising privacy.
Online Social Networks(OSN) plays a vital role in our day to day life. The most popular social network, Facebook alone counts currently 2.23 billion users worldwide. Online social network users are aware of the various security risks that exist in this scenario including privacy violations and they are utilizing the privacy settings provided by OSN providers to make their data safe. But most of them are unaware of the risk which exists after deletion of their data which is not really getting deleted from the OSN server. Self destruction of data is one of the prime recommended methods to achieve assured deletion of data. Numerous techniques have been developed for self destruction of data and this paper discusses and evaluates these techniques along with the various privacy risks faced by an OSN user in this web centered world.
The disclosure of an important yet sensitive link may cause serious privacy crisis between two users of a social graph. Only deleting the sensitive link referred to as a target link which is often the attacked target of adversaries is not enough, because the adversarial link prediction can deeply forecast the existence of the missing target link. Thus, to defend some specific adversarial link prediction, a budget limited number of other non-target links should be optimally removed. We first propose a path-based dissimilarity function as the optimizing objective and prove that the greedy link deletion to preserve target link privacy referred to as the GLD2Privacy which has monotonicity and submodularity properties can achieve a near optimal solution. However, emulating all length limited paths between any pair of nodes for GLD2Privacy mechanism is impossible in large scale social graphs. Secondly, we propose a Walk2Privacy mechanism that uses self-avoiding random walk which can efficiently run in large scale graphs to sample the paths of given lengths between the two ends of any missing target link, and based on the sampled paths we select the alternative non-target links being deleted for privacy purpose. Finally, we compose experiments to demonstrate that the Walk2Privacy algorithm can remarkably reduce the time consumption and achieve a very near solution that is achieved by the GLD2Privacy.
This paper studies the deletion propagation problem in terms of minimizing view side-effect. It is a problem funda-mental to data lineage and quality management which could be a key step in analyzing view propagation and repairing data. The investigated problem is a variant of the standard deletion propagation problem, where given a source database D, a set of key preserving conjunctive queries Q, and the set of views V obtained by the queries in Q, we try to identify a set T of tuples from D whose elimination prevents all the tuples in a given set of deletions on views △V while preserving any other results. The complexity of this problem has been well studied for the case with only a single query. Dichotomies, even trichotomies, for different settings are developed. However, no results on multiple queries are given which is a more realistic case. We study the complexity and approximations of optimizing the side-effect on the views, i.e., find T to minimize the additional damage on V after removing all the tuples of △V. We focus on the class of key-preserving conjunctive queries which is a dichotomy for the single query case. It is surprising to find that except the single query case, this problem is NP-hard to approximate within any constant even for a non-trivial set of multiple project-free conjunctive queries in terms of view side-effect. The proposed algorithm shows that it can be approximated within a bound depending on the number of tuples of both V and △V. We identify a class of polynomial tractable inputs, and provide a dynamic programming algorithm to solve the problem. Besides data lineage, study on this problem could also provide important foundations for the computational issues in data repairing. Furthermore, we introduce some related applications of this problem, especially for query feedback based data cleaning.
File update operations generate many invalid flash pages in Solid State Drives (SSDs) because of the-of-place update feature. If these invalid flash pages are not securely deleted, they will be left in the “missing” state, resulting in leakage of sensitive information. However, deleting these invalid pages in real time greatly reduces the performance of SSD. In this paper, we propose a Per-File Secure Deletion (PSD) scheme for SSD to achieve non-real-time secure deletion. PSD assigns a globally unique identifier (GUID) to each file to quickly locate the invalid data blocks and uses Security-TRIM command to securely delete these invalid data blocks. Moreover, we propose a PSD-MLC scheme for Multi-Level Cell (MLC) flash memory. PSD-MLC distributes the data blocks of a file in pairs of pages to avoid the influence of programming crosstalk between paired pages. We evaluate our schemes on different hardware platforms of flash media, and the results prove that PSD and PSD-MLC only have little impact on the performance of SSD. When the cache is disabled and enabled, compared with the system without the secure deletion, PSD decreases SSD throughput by 1.3% and 1.8%, respectively. PSD-MLC decreases SSD throughput by 9.5% and 10.0%, respectively.
Blockchain is a database technology that provides the integrity and trust of the system can't make arbitrary modifications and deletions by being an append-only distributed ledger. That is, the blockchain is not a modification or deletion but a CRAB (Create-Retrieve-Append-Burn) method in which data can be read and written according to a legitimate user's access right(For example, owner private key). However, this can not delete the created data once, which causes problems such as privacy breach. In this paper, we propose an on-off block-chained Hybrid Blockchain system to separate the data and save the connection history to the blockchain. In addition, the state is changed to the distributed database separately from the ledger record, and the state is changed by generating the arbitrary injection in the XOR form, so that the history of modification / deletion of the Off Blockchain can be efficiently retrieved.
Existing secure deletion approaches are inefficient in erasing data permanently because file systems have no knowledge of the data layout on the storage device, nor is the storage device aware of file information within the file systems. This inefficiency is exaggerated on the emerging shingled magnetic recording (SMR) drive due to its inherent sequential-write constraint. On SMR drives, secure deletion requests may lead to serious write amplification and performance degradation if the data layout is not properly configured. Such observation motivates us to propose a file-oriented fast secure deletion (FFSD) strategy to alleviate the negative impacts of SMR drives' sequential-write constraint and improve the efficiency of secure deletion operations on SMR drives. A series of experiments was conducted to demonstrate the capability of the proposed strategy on improving the efficiency of secure deletion on SMR drives.
We present a novel, and use case agnostic method of identifying and circumventing private data exposure across distributed and high-dimensional data repositories. Examples of distributed high-dimensional data repositories include medical research and treatment data, where oftentimes more than 300 describing attributes appear. As such, providing strong guarantees of data anonymity in these repositories is a hard constraint in adhering to privacy legislation. Yet, when applied to distributed high-dimensional data, existing anonymisation algorithms incur high levels of information loss and do not guarantee privacy defeating the purpose of anonymisation. In this paper, we address this issue by using Bayesian networks to handle data transformation for anonymisation. By evaluating every attribute combination to determine the privacy exposure risk, the conditional probability linking attribute pairs is computed. Pairs with a high conditional probability expose the risk of deanonymisation similar to quasi-identifiers and can be separated instead of deleted, as in previous algorithms. Attribute separation removes the risk of privacy exposure, and deletion avoidance results in a significant reduction in information loss. In other words, assimilating the conditional probability of outliers directly in the adjacency matrix in a greedy fashion is quick and thwarts de-anonymisation. Since identifying every privacy violating attribute combination is a W[2]-complete problem, we optimise the procedure with a multigrid solver method by evaluating the conditional probabilities between attribute pairs, and aggregating state space explosion of attribute pairs through manifold learning. Finally, incremental processing of new data is achieved through inexpensive, continuous (delta) learning.
As an efficient deletion method, unlinking is widely used in cloud storage. While unlinking is a kind of incomplete deletion, `deleted data' remains on cloud and can be recovered. To make `deleted data' unrecoverable, overwriting is an effective method on cloud. Users lose control over their data on cloud once deleted, so it is difficult for them to confirm overwriting. In face of such a crucial problem, we propose a Provable and Traceable Assured Deletion (PTAD) scheme in cloud storage based on blockchain. PTAD scheme relies on overwriting to achieve assured deletion. We reference the idea of data integrity checking and design algorithms to verify if cloud overwrites original blocks properly as specific patterns. We utilize technique of smart contract in blockchain to automatically execute verification and keep transaction in ledger for tracking. The whole scheme can be divided into three stages-unlinking, overwriting and verification-and we design one specific algorithm for each stage. For evaluation, we implement PTAD scheme on cloud and construct a consortium chain with Hyperledger Fabric. The performance shows that PTAD scheme is effective and feasible.
Crowdsensing, driven by the proliferation of sensor-rich mobile devices, has emerged as a promising data sensing and aggregation paradigm. Despite useful, traditional crowdsensing systems typically rely on a centralized third-party platform for data collection and processing, which leads to concerns like single point of failure and lack of operation transparency. Such centralization hinders the wide adoption of crowdsensing by wary participants. We therefore explore an alternative design space of building crowdsensing systems atop the emerging decentralized blockchain technology. While enjoying the benefits brought by the public blockchain, we endeavor to achieve a consolidated set of desirable security properties with a proper choreography of latest techniques and our customized designs. We allow data providers to safely contribute data to the transparent blockchain with the confidentiality guarantee on individual data and differential privacy on the aggregation result. Meanwhile, we ensure the service correctness of data aggregation and sanitization by delicately employing hardware-assisted transparent enclave. Furthermore, we maintain the robustness of our system against faulty data providers that submit invalid data, with a customized zero-knowledge range proof scheme. The experiment results demonstrate the high efficiency of our designs on both mobile client and SGX-enabled server, as well as reasonable on-chain monetary cost of running our task contract on Ethereum.
Guaranteeing a certain level of user privacy in an arbitrary piece of text is a challenging issue. However, with this challenge comes the potential of unlocking access to vast data stores for training machine learning models and supporting data driven decisions. We address this problem through the lens of dx-privacy, a generalization of Differential Privacy to non Hamming distance metrics. In this work, we explore word representations in Hyperbolic space as a means of preserving privacy in text. We provide a proof satisfying dx-privacy, then we define a probability distribution in Hyperbolic space and describe a way to sample from it in high dimensions. Privacy is provided by perturbing vector representations of words in high dimensional Hyperbolic space to obtain a semantic generalization. We conduct a series of experiments to demonstrate the tradeoff between privacy and utility. Our privacy experiments illustrate protections against an authorship attribution algorithm while our utility experiments highlight the minimal impact of our perturbations on several downstream machine learning models. Compared to the Euclidean baseline, we observe \textbackslashtextgreater 20x greater guarantees on expected privacy against comparable worst case statistics.
Machine learning is a major area in artificial intelligence, which enables computer to learn itself explicitly without programming. As machine learning is widely used in making decision automatically, attackers have strong intention to manipulate the prediction generated my machine learning model. In this paper we study about the different types of attacks and its countermeasures on machine learning model. By research we found that there are many security threats in various algorithms such as K-nearest-neighbors (KNN) classifier, random forest, AdaBoost, support vector machine (SVM), decision tree, we revisit existing security threads and check what are the possible countermeasures during the training and prediction phase of machine learning model. In machine learning model there are 2 types of attacks that is causative attack which occurs during the training phase and exploratory attack which occurs during the prediction phase, we will also discuss about the countermeasures on machine learning model, the countermeasures are data sanitization, algorithm robustness enhancement, and privacy preserving techniques.
Scala programming language combines object-oriented and functional programming in one concise, high-level language, and the language supports static types that help to avoid bugs in complex programs. This paper proposes a dynamic taint analyzer called ScalaTaint for Scala applications. The analyzer traces the propagation of malicious inputs from untrusted sources to sensitive sink methods in programs that can be exploited by adversaries. In this work, we evaluated the accuracy of ScalaTaint with a security benchmark suite including 7 projects in Scala. As a result, our analyzer could report 49 vulnerabilities within 753,372 lines of code. Moreover, the result of our performance measurement on ScalaBench shows 67% runtime overhead that demonstrates the usefulness and efficiently of our technique in comparison with similar tools.
In order to protect individuals' privacy, data have to be "well-sanitized" before sharing them, i.e. one has to remove any personal information before sharing data. However, it is not always clear when data shall be deemed well-sanitized. In this paper, we argue that the evaluation of sanitized data should be based on whether the data allows the inference of sensitive information that is specific to an individual, instead of being centered around the concept of re-identification. We propose a framework to evaluate the effectiveness of different sanitization techniques on a given dataset by measuring how much an individual's record from the sanitized dataset influences the inference of his/her own sensitive attribute. Our intent is not to accurately predict any sensitive attribute but rather to measure the impact of a single record on the inference of sensitive information. We demonstrate our approach by sanitizing two real datasets in different privacy models and evaluate/compare each sanitized dataset in our framework.
Event logs that originate from information systems enable comprehensive analysis of business processes, e.g., by process model discovery. However, logs potentially contain sensitive information about individual employees involved in process execution that are only partially hidden by an obfuscation of the event data. In this paper, we therefore address the risk of privacy-disclosure attacks on event logs with pseudonymized employee information. To this end, we introduce PRETSA, a novel algorithm for event log sanitization that provides privacy guarantees in terms of k-anonymity and t-closeness. It thereby avoids disclosure of employee identities, their membership in the event log, and their characterization based on sensitive attributes, such as performance information. Through step-wise transformations of a prefix-tree representation of an event log, we maintain its high utility for discovery of a performance-annotated process model. Experiments with real-world data demonstrate that sanitization with PRETSA yields event logs of higher utility compared to methods that exploit frequency-based filtering, while providing the same privacy guarantees.
Because cloud storage services have been broadly used in enterprises for online sharing and collaboration, sensitive information in images or documents may be easily leaked outside the trust enterprise on-premises due to such cloud services. Existing solutions to this problem have not fully explored the tradeoffs among application performance, service scalability, and user data privacy. Therefore, we propose CloudDLP, a generic approach for enterprises to automatically sanitize sensitive data in images and documents in browser-based cloud storage. To the best of our knowledge, CloudDLP is the first system that automatically and transparently detects and sanitizes both sensitive images and textual documents without compromising user experience or application functionality on browser-based cloud storage. To prevent sensitive information escaping from on-premises, CloudDLP utilizes deep learning methods to detect sensitive information in both images and textual documents. We have evaluated the proposed method on a number of typical cloud applications. Our experimental results show that it can achieve transparent and automatic data sanitization on the cloud storage services with relatively low overheads, while preserving most application functionalities.
As data security has become one of the most crucial issues in modern storage system/application designs, the data sanitization techniques are regarded as the promising solution on 3D NAND flash-memory-based devices. Many excellent works had been proposed to exploit the in-place reprogramming, erasure and encryption techniques to achieve and implement the sanitization functionalities. However, existing sanitization approaches could lead to performance, disturbance overheads or even deciphered issues. Different from existing works, this work aims at exploring an instantaneous data sanitization scheme by taking advantage of programming disturbance properties. Our proposed design can not only achieve the instantaneous data sanitization by exploiting programming disturbance and error correction code properly, but also enhance the performance with the recycling programming design. The feasibility and capability of our proposed design are evaluated by a series of experiments on 3D NAND flash memory chips, for which we have very encouraging results. The experiment results show that the proposed design could achieve the instantaneous data sanitization with low overhead; besides, it improves the average response time and reduces the number of block erase count by up to 86.8% and 88.8%, respectively.
Witnessing the increasingly pervasive deployment of security video surveillance systems(VSS), more and more individuals have become concerned with the issues of privacy violations. While the majority of the public have a favorable view of surveillance in terms of crime deterrence, individuals do not accept the invasive monitoring of their private life. To date, however, there is not a lightweight and secure privacy-preserving solution for video surveillance systems. The recent success of blockchain (BC) technologies and their applications in the Internet of Things (IoT) shed a light on this challenging issue. In this paper, we propose a Lightweight, Blockchain-based Privacy protection (Lib-Pri) scheme for surveillance cameras at the edge. It enables the VSS to perform surveillance without compromising the privacy of people captured in the videos. The Lib-Pri system transforms the deployed VSS into a system that functions as a federated blockchain network capable of carrying out integrity checking, blurring keys management, feature sharing, and video access sanctioning. The policy-based enforcement of privacy measures is carried out at the edge devices for real-time video analytics without cluttering the network.
Signal processing in encrypted domain has become an important mean to protect privacy in an untrusted network environment. Due to the limitations of the underlying encryption methods, many useful algorithms that are sophisticated are not well implemented. Considering that QR decomposition is widely used in many fields, in this paper, we propose to implement QR decomposition in homomorphic encrypted domain. We firstly realize some necessary primitive operations in homomorphic encrypted domain, including division and open square operation. Gram-Schmidt process is then studied in the encrypted domain. We propose the implementation of QR decomposition in the encrypted domain by using the secure implementation of Gram-Schmidt process. We conduct experiments to demonstrate the effectiveness and analyze the performance of the proposed outsourced QR decomposition.
In the past few years, visual information collection and transmission is increased significantly for various applications. Smart vehicles, service robotic platforms and surveillance cameras for the smart city applications are collecting a large amount of visual data. The preservation of the privacy of people presented in this data is an important factor in storage, processing, sharing and transmission of visual data across the Internet of Robotic Things (IoRT). In this paper, a novel anonymisation method for information security and privacy preservation in visual data in sharing layer of the Web of Robotic Things (WoRT) is proposed. The proposed framework uses deep neural network based semantic segmentation to preserve the privacy in video data base of the access level of the applications and users. The data is anonymised to the applications with lower level access but the applications with higher legal access level can analyze and annotated the complete data. The experimental results show that the proposed method while giving the required access to the authorities for legal applications of smart city surveillance, is capable of preserving the privacy of the people presented in the data.