Biblio
The promise of big data relies on the release and aggregation of data sets. When these data sets contain sensitive information about individuals, it has been scalable and convenient to protect the privacy of these individuals by de-identification. However, studies show that the combination of de-identified data sets with other data sets risks re-identification of some records. Some studies have shown how to measure this risk in specific contexts where certain types of public data sets (such as voter roles) are assumed to be available to attackers. To the extent that it can be accomplished, such analyses enable the threat of compromises to be balanced against the benefits of sharing data. For example, a study that might save lives by enabling medical research may be enabled in light of a sufficiently low probability of compromise from sharing de-identified data. In this paper, we introduce a general probabilistic re-identification framework that can be instantiated in specific contexts to estimate the probability of compromises based on explicit assumptions. We further propose a baseline of such assumptions that enable a first-cut estimate of risk for practical case studies. We refer to the framework with these assumptions as the Naive Re-identification Framework (NRF). As a case study, we show how we can apply NRF to analyze and quantify the risk of re-identification arising from releasing de-identified medical data in the context of publicly-available social media data. The results of this case study show that NRF can be used to obtain meaningful quantification of the re-identification risk, compare the risk of different social media, and assess risks of combinations of various demographic attributes and medical conditions that individuals may voluntarily disclose on social media.
A blockchain powered Health information ecosystem can solve a frequently discussed problem of the lifelong recorded patient health data, which seriously could hurdle the privacy of the patients and the growing data hunger of the research and policy maker institutions. On one side the general availability of the data is vital in emergency situations and supports heavily the different research, population health management and development activities, on the other side using the same data can lead to serious social and ethical problems caused by malicious actors. Currently, the regulation of the privacy data varies all over the world, however underlying principles are always defensive and protective towards patient privacy against general availability. The protective principles cause a defensive, data hiding attitude of the health system developers to avoid breaching the overall law regulations. It makes the policy makers and different - primarily drug - developers to find ways to treat data such a way that lead to ethical and political debates. In our paper we introduce how the blockchain technology can help solving the problem of secure data storing and ensuring data availability at the same time. We use the basic principles of the American HIPAA regulation, which defines the public availability criteria of health data, however the different local regulations may differ significantly. Blockchain's decentralized, intermediary-free, cryptographically secured attributes offer a new way of storing patient data securely and at the same time publicly available in a regulated way, where a well-designed distributed peer-to-peer network incentivize the smooth operation of a full-featured EHR system.
Differential privacy is an approach that preserves patient privacy while permitting researchers access to medical data. This paper presents mechanisms proposed to satisfy differential privacy while answering a given workload of range queries. Representing input data as a vector of counts, these methods partition the vector according to relationships between the data and the ranges of the given queries. After partitioning the vector into buckets, the counts of each bucket are estimated privately and split among the bucket's positions to answer the given query set. The performance of the proposed method was evaluated using different workloads over several attributes. The results show that partitioning the vector based on the data can produce more accurate answers, while partitioning the vector based on the given workload improves privacy. This paper's two main contributions are: (1) improving earlier work on partitioning mechanisms by building a greedy algorithm to partition the counts' vector efficiently, and (2) its adaptive algorithm considers the sensitivity of the given queries before providing results.
Wearable medical devices are playing more and more important roles in healthcare. Unlike the wired connection, the wireless connection between wearable devices and the remote servers are exceptionally vulnerable to malicious attacks, and poses threats to the safety and privacy of the patient health data. Therefore, wearable medical devices require the implementation of reliable measures to secure the wireless network communication. However, those devices usually have limited computational power that is not comparable with the desktop computer and thus, it is difficult to adopt the full-fledged security algorithm in software. In this study, we have developed an efficient authentication and encryption protocol for internetconnected wearable devices using the recognized standards of AES and SHA that can provide two-way authentication between wearable device and remote server and protection of patient privacy against various network threats. We have tested the feasibility of this protocol on the TI CC3200 Launchpad, an evaluation board of the CC3200, which is a Wi-Fi capable microcontroller designed for wearable devices and includes a hardware accelerated cryptography module for the implementation of the encryption algorithm. The microcontroller serves as the wearable device client and a Linux computer serves as the server. The embedded client software was written in ANSI C and the server software was written in Python.