Visible to the public Fast, Private and Verifiable: Server-aided Approximate Similarity Computation over Large-Scale Datasets

TitleFast, Private and Verifiable: Server-aided Approximate Similarity Computation over Large-Scale Datasets
Publication TypeConference Paper
Year of Publication2016
AuthorsQiu, Shuo, Wang, Boyang, Li, Ming, Victors, Jesse, Liu, Jiqiang, Shi, Yanfeng, Wang, Wei
Conference NameProceedings of the 4th ACM International Workshop on Security in Cloud Computing
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4285-8
Keywordscloud, Metrics, minhashing, Privacy-preserving, pubcrawl, Resiliency, Scalability, similarity computation, user privacy, user privacy in the cloud, verifiability
Abstract

Computing similarity, especially Jaccard Similarity, between two datasets is a fundamental building block in big data analytics, and extensive applications including genome matching, plagiarism detection, social networking, etc. The increasing user privacy concerns over the release of has sensitive data have made it desirable and necessary for two users to evaluate Jaccard Similarity over their datasets in a privacy-preserving manner. In this paper, we propose two efficient and secure protocols to compute the Jaccard Similarity of two users' private sets with the help of an unfully-trusted server. Specifically, in order to boost the efficiency, we leverage Minhashing algorithm on encrypted data, where the output of our protocols is guaranteed to be a close approximation of the exact value. In both protocols, only an approximate similarity result is leaked to the server and users. The first protocol is secure against a semi-honest server, while the second protocol, with a novel consistency-check mechanism, further achieves result verifiability against a malicious server who cheats in the executions. Experimental results show that our first protocol computes an approximate Jaccard Similarity of two billion-element sets within only 6 minutes (under 256-bit security in parallel mode). To the best of our knowledge, our consistency-check mechanism represents the very first work to realize an efficient verification particularly on approximate similarity computation.

URLhttp://doi.acm.org/10.1145/2898445.2898453
DOI10.1145/2898445.2898453
Citation Keyqiu_fast_2016