Elxa: Scalable Privacy-Preserving Plagiarism Detection

Submitted by grigby1 on Tue, 05/30/2017 - 12:56pm

Title	Elxa: Scalable Privacy-Preserving Plagiarism Detection
Publication Type	Conference Paper
Year of Publication	2016
Authors	Unger, Nik, Thandra, Sahithi, Goldberg, Ian
Conference Name	Proceedings of the 2016 ACM on Workshop on Privacy in the Electronic Society
Publisher	ACM
Conference Location	New York, NY, USA
ISBN Number	978-1-4503-4569-9
Keywords	Applied Cryptography, Collaboration, composability, Human Behavior, Metrics, peer to peer security, plagiarism detection, privacy preservation, private record linkage, pubcrawl, Resiliency, Scalability, secure multi-party computation
Abstract	One of the most challenging issues facing academic conferences and educational institutions today is plagiarism detection. Typically, these entities wish to ensure that the work products submitted to them have not been plagiarized from another source (e.g., authors submitting identical papers to multiple journals). Assembling large centralized databases of documents dramatically improves the effectiveness of plagiarism detection techniques, but introduces a number of privacy and legal issues: all document contents must be completely revealed to the database operator, making it an attractive target for abuse or attack. Moreover, this content aggregation involves the disclosure of potentially sensitive private content, and in some cases this disclosure may be prohibited by law. In this work, we introduce Elxa, the first scalable centralized plagiarism detection system that protects the privacy of the submissions. Elxa incorporates techniques from the current state of the art in plagiarism detection, as evaluated by the information retrieval community. Our system is designed to be operated on existing cloud computing infrastructure, and to provide incentives for the untrusted database operator to maintain the availability of the network. Elxa can be used to detect plagiarism in student work, duplicate paper submissions (and their associated peer reviews), similarities between confidential reports (e.g., malware summaries), or any approximate text reuse within a network of private documents. We implement a prototype using the Hadoop MapReduce framework, and demonstrate that it is feasible to achieve competitive detection effectiveness in the private setting.
URL	http://doi.acm.org/10.1145/2994620.2994633
DOI	10.1145/2994620.2994633
Citation Key	unger_elxa:_2016

Groups:

Science of Security VO