Biblio
Collecting and analyzing personal data is important in modern information applications. Though the privacy of data providers should be protected, some adversarial users may behave badly under circumstances where they are not identified. However, the privacy of honest users should not be infringed. Thus, detecting anomalies without revealing normal users-identities is quite important for operating information systems using personal data. Though various methods of statistics and machine learning have been developed for detecting anomalies, it is difficult to know in advance what anomaly will come up. Thus, it would be useful to provide a "general" framework that can employ any anomaly detection method regardless of the type of data and the nature of the abnormality. In this paper, we propose a privacy preserving anomaly detection framework that allows an authority to detect adversarial users while other honest users are kept anonymous. By using cryptographic techniques, group signatures with message-dependent opening (GS-MDO) and public key encryption with non-interactive opening (PKENO), we provide a correspondence table that links a user and data in a secure way, and we can employ any anonymization technique and any anomaly detection method. It is particularly worth noting that no big brother exists, meaning that no single entity can identify users, while bad behaviors are always traceable. We also show the result of implementing our framework. Briefly, the overhead of our framework is on the order of dozens of milliseconds.
Let us consider a scenario that a data holder (e.g., a hospital) encrypts a data (e.g., a medical record) which relates a keyword (e.g., a disease name), and sends its ciphertext to a server. We here suppose not only the data but also the keyword should be kept private. A receiver sends a query to the server (e.g., average of body weights of cancer patients). Then, the server performs the homomorphic operation to the ciphertexts of the corresponding medical records, and returns the resultant ciphertext. In this scenario, the server should NOT be allowed to perform the homomorphic operation against ciphertexts associated with different keywords. If such a mis-operation happens, then medical records of different diseases are unexpectedly mixed. However, in the conventional homomorphic encryption, there is no way to prevent such an unexpected homomorphic operation, and this fact may become visible after decrypting a ciphertext, or as the most serious case it might be never detected. To circumvent this problem, in this paper, we propose mis-operation resistant homomorphic encryption, where even if one performs the homomorphic operations against ciphertexts associated with keywords ω' and ω, where ω -ω', the evaluation algorithm detects this fact. Moreover, even if one (intentionally or accidentally) performs the homomorphic operations against such ciphertexts, a ciphertext associated with a random keyword is generated, and the decryption algorithm rejects it. So, the receiver can recognize such a mis-operation happens in the evaluation phase. In addition to mis-operation resistance, we additionally adopt secure search functionality for keywords since it is desirable when one would like to delegate homomorphic operations to a third party. So, we call the proposed primitive mis-operation resistant searchable homomorphic encryption (MR-SHE). We also give our implementation result of inner products of encrypted vectors. In the case when both vectors are encrypted, the running time of the receiver is millisecond order for relatively small-dimensional (e.g., 26) vectors. In the case when one vector is encrypted, the running time of the receiver is approximately 5 msec even for relatively high-dimensional (e.g., 213) vectors.
Logistic regression is a powerful machine learning tool to classify data. When dealing with sensitive data such as private or medical information, cares are necessary. In this paper, we propose a secure system for protecting the training data in logistic regression via homomorphic encryption. Perhaps surprisingly, despite the non-polynomial tasks of training in logistic regression, we show that only additively homomorphic encryption is needed to build our system. Our system is secure and scalable with the dataset size.