Title | Single-Setup Privacy Enforcement for Heterogeneous Data Ecosystems |
Publication Type | Conference Paper |
Year of Publication | 2018 |
Authors | Buenrostro, Issac, Tiwari, Abhishek, Rajamani, Vasanth, Pattuk, Erman, Chen, Zhixiong |
Conference Name | Proceedings of the 27th ACM International Conference on Information and Knowledge Management |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-6014-2 |
Keywords | apache gobblin, compositionality, Data Sanitization, hive, Human Behavior, offline processing, policy enforcement, privacy, pubcrawl, resilience, wherehows |
Abstract | Strong member privacy in technology enterprises involves, among other objectives, deleting or anonymizing various kinds of data that a company controls. Those requirements are complicated in a heterogeneous data ecosystem where data is stored in multiple stores with different semantics: different indexing or update capabilities require processes specific to a store or even schema. In this demo we showcase a method to enforce record controls of arbitrary data stores via a three step process: generate an offline snapshot, run a policy mechanism to select rows to delete/update, and apply the changes to the original store. The first and third steps work on any store by leveraging Apache Gobblin, an open source data integration framework. The policy computation step runs as a batch Gobblin job where each table can be customized via a dataset metadata tracking system and SQL expressions providing table-specific business logic. This setup allows enforcement of highly-customizable privacy requirements in a variety of systems from hosted databases to third party data storage systems. |
URL | http://doi.acm.org/10.1145/3269206.3269228 |
DOI | 10.1145/3269206.3269228 |
Citation Key | buenrostro_single-setup_2018 |