Biblio
Skyline computation is an increasingly popular query, with broad applicability to many domains. Given the trend to outsource databases, and due to the sensitive nature of the data (e.g., in healthcare), it is essential to evaluate skylines on encrypted datasets. Research efforts acknowledged the importance of secure skyline computation, but existing solutions suffer from several shortcomings: (i) they only provide ad-hoc security; (ii) they are prohibitively expensive; or (iii) they rely on assumptions such as the presence of multiple non-colluding parties in the protocol. Inspired by solutions for secure nearest-neighbors, we conjecture that a secure and efficient way to compute skylines is through result materialization. However, materialization is much more challenging for skylines queries due to large space requirements. We show that pre-computing skyline results while minimizing storage overhead is NP-hard, and we provide heuristics that solve the problem more efficiently, while maintaining storage at reasonable levels. Our algorithms are novel and also applicable to regular skyline computation, but we focus on the encrypted setting where materialization reduces the response time of skyline queries from hours to seconds. Extensive experiments show that we clearly outperform existing work in terms of performance, and our security analysis proves that we obtain a small (and quantifiable) data leakage.
Given that an increasingly larger part of an organization's activity is taking place online, especially in the current situation caused by the COVID-19 pandemic, network log data collected by organizations contain an accurate image of daily activity patterns. In some scenarios, it may be useful to share such data with other parties in order to improve collaboration, or to address situations such as cyber-security incidents that may affect multiple organizations. However, in doing so, serious privacy concerns emerge. One can uncover a lot of sensitive information when analyzing an organization's network logs, ranging from confidential business interests to personal details of individual employees (e.g., medical conditions, political orientation, etc). Our objective is to enable organizations to share information about their network logs, while at the same time preserving data privacy. Specifically, we focus on enabling encrypted search at network flow granularity. We consider several state-of-the-art searchable encryption flavors for this purpose (including hidden vector encryption and inner product encryption), and we propose several customized encoding techniques for network flow information in order to reduce the overhead of applying state-of-the-art searchable encryption techniques, which are notoriously expensive.
Provenance workflows capture movement and transformation of data in complex environments, such as document management in large organizations, content generation and sharing in in social media, scientific computations, etc. Sharing and processing of provenance workflows brings numerous benefits, e.g., improving productivity in an organization, understanding social media interaction patterns, etc. However, directly sharing provenance may also disclose sensitive information such as confidential business practices, or private details about participants in a social network. We propose an algorithm that privately extracts sequential association rules from provenance workflow datasets. Finding such rules has numerous practical applications, such as capacity planning or identifying hot-spots in provenance graphs. Our approach provides good accuracy and strong privacy, by leveraging on the exponential mechanism of differential privacy. We propose an heuristic that identifies promising candidate rules and makes judicious use of the privacy budget. Experimental results show that the our approach is fast and accurate, and clearly outperforms the state-of-the-art. We also identify influential factors in improving accuracy, which helps in choosing promising directions for future improvement.