Almost all domains of life, including medicine, government, and business, have data recorded on them at an unprecedented rate by many independent parties. To realize insights from these fractured datasets, data scientists often set up a data federation in which multiple autonomous databases are united to appear as a single engine for querying. In many settings this is challenging due to privacy concerns and regulatory requirements. This project investigates a private data federation that runs queries (in SQL) over the combined secret records of multiple sources such that the only information revealed from a query's execution is that which can be deduced from its output. The federation uses secure multi-party computation to protect its inputs. These cryptographic protocols run obliviously among the data providers such that their execution is independent of a query's secret inputs. The findings from the project, which include the open-source private data federation prototype, could empower database users to gain actionable insights from fractured datasets without specialized training.
This project identifies, formalizes, and exploits opportunities to run private data federation queries efficiently while upholding their privacy guarantees. It uses synergies between the relational model and secure computation to create new algorithms for oblivious database operators that minimize their use of heavyweight cryptographic protocols. The investigators design and evaluate an algebra that leverages properties of the relational model, such as key definitions and integrity constraints, to bound the cardinality of a query's intermediate results thereby reducing the complexity of its secure circuits. The project evaluates the real-world impact of this technology with electronic health records from multiple hospitals for clinical data research.
|