Biblio
The promise of big data relies on the release and aggregation of data sets. When these data sets contain sensitive information about individuals, it has been scalable and convenient to protect the privacy of these individuals by de-identification. However, studies show that the combination of de-identified data sets with other data sets risks re-identification of some records. Some studies have shown how to measure this risk in specific contexts where certain types of public data sets (such as voter roles) are assumed to be available to attackers. To the extent that it can be accomplished, such analyses enable the threat of compromises to be balanced against the benefits of sharing data. For example, a study that might save lives by enabling medical research may be enabled in light of a sufficiently low probability of compromise from sharing de-identified data. In this paper, we introduce a general probabilistic re-identification framework that can be instantiated in specific contexts to estimate the probability of compromises based on explicit assumptions. We further propose a baseline of such assumptions that enable a first-cut estimate of risk for practical case studies. We refer to the framework with these assumptions as the Naive Re-identification Framework (NRF). As a case study, we show how we can apply NRF to analyze and quantify the risk of re-identification arising from releasing de-identified medical data in the context of publicly-available social media data. The results of this case study show that NRF can be used to obtain meaningful quantification of the re-identification risk, compare the risk of different social media, and assess risks of combinations of various demographic attributes and medical conditions that individuals may voluntarily disclose on social media.
Side-channel risks of Intel SGX have recently attracted great attention. Under the spotlight is the newly discovered page-fault attack, in which an OS-level adversary induces page faults to observe the page-level access patterns of a protected process running in an SGX enclave. With almost all proposed defense focusing on this attack, little is known about whether such efforts indeed raise the bar for the adversary, whether a simple variation of the attack renders all protection ineffective, not to mention an in-depth understanding of other attack surfaces in the SGX system. In the paper, we report the first step toward systematic analyses of side-channel threats that SGX faces, focusing on the risks associated with its memory management. Our research identifies 8 potential attack vectors, ranging from TLB to DRAM modules. More importantly, we highlight the common misunderstandings about SGX memory side channels, demonstrating that high frequent AEXs can be avoided when recovering EdDSA secret key through a new page channel and fine-grained monitoring of enclave programs (at the level of 64B) can be done through combining both cache and cross-enclave DRAM channels. Our findings reveal the gap between the ongoing security research on SGX and its side-channel weaknesses, redefine the side-channel threat model for secure enclaves, and can provoke a discussion on when to use such a system and how to use it securely.
From pencils to commercial aircraft, every man-made object must be designed and manufactured. When it is cheaper or easier to steal a design or a manufacturing process specification than to invent one's own, the incentive for theft is present. As more and more manufacturing data comes online, incidents of such theft are increasing. In this paper, we present a side-channel attack on manufacturing equipment that reveals both the form of a product and its manufacturing process, i.e., exactly how it is made. In the attack, a human deliberately or accidentally places an attack-enabled phone close to the equipment or makes or receives a phone call on any phone nearby. The phone executing the attack records audio and, optionally, magnetometer data. We present a method of reconstructing the product's form and manufacturing process from the captured data, based on machine learning, signal processing, and human assistance. We demonstrate the attack on a 3D printer and a CNC mill, each with its own acoustic signature, and discuss the commonalities in the sensor data captured for these two different machines. We compare the quality of the data captured with a variety of smartphone models. Capturing data from the 3D printer, we reproduce the form and process information of objects previously unknown to the reconstructors. On average, our accuracy is within 1 mm in reconstructing the length of a line segment in a fabricated object's shape and within 1 degree in determining an angle in a fabricated object's shape. We conclude with recommendations for defending against these attacks.