Visible to the public Single-Setup Privacy Enforcement for Heterogeneous Data Ecosystems

TitleSingle-Setup Privacy Enforcement for Heterogeneous Data Ecosystems
Publication TypeConference Paper
Year of Publication2018
AuthorsBuenrostro, Issac, Tiwari, Abhishek, Rajamani, Vasanth, Pattuk, Erman, Chen, Zhixiong
Conference NameProceedings of the 27th ACM International Conference on Information and Knowledge Management
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-6014-2
Keywordsapache gobblin, compositionality, Data Sanitization, hive, Human Behavior, offline processing, policy enforcement, privacy, pubcrawl, resilience, wherehows
AbstractStrong member privacy in technology enterprises involves, among other objectives, deleting or anonymizing various kinds of data that a company controls. Those requirements are complicated in a heterogeneous data ecosystem where data is stored in multiple stores with different semantics: different indexing or update capabilities require processes specific to a store or even schema. In this demo we showcase a method to enforce record controls of arbitrary data stores via a three step process: generate an offline snapshot, run a policy mechanism to select rows to delete/update, and apply the changes to the original store. The first and third steps work on any store by leveraging Apache Gobblin, an open source data integration framework. The policy computation step runs as a batch Gobblin job where each table can be customized via a dataset metadata tracking system and SQL expressions providing table-specific business logic. This setup allows enforcement of highly-customizable privacy requirements in a variety of systems from hosted databases to third party data storage systems.
URLhttp://doi.acm.org/10.1145/3269206.3269228
DOI10.1145/3269206.3269228
Citation Keybuenrostro_single-setup_2018