Biblio
Increasingly organizations are collecting ever larger amounts of data to build complex data analytics, machine learning and AI models. Furthermore, the data needed for building such models may be unstructured (e.g., text, image, and video). Hence such data may be stored in different data management systems ranging from relational databases to newer NoSQL databases tailored for storing unstructured data. Furthermore, data scientists are increasingly using programming languages such as Python, R etc. to process data using many existing libraries. In some cases, the developed code will be automatically executed by the NoSQL system on the stored data. These developments indicate the need for a data security and privacy solution that can uniformly protect data stored in many different data management systems and enforce security policies even if sensitive data is processed using a data scientist submitted complex program. In this paper, we introduce our vision for building such a solution for protecting big data. Specifically, our proposed system system allows organizations to 1) enforce policies that control access to sensitive data, 2) keep necessary audit logs automatically for data governance and regulatory compliance, 3) sanitize and redact sensitive data on-the-fly based on the data sensitivity and AI model needs, 4) detect potentially unauthorized or anomalous access to sensitive data, 5) automatically create attribute-based access control policies based on data sensitivity and data type.