Visible to the public M-sanit: Computing misusability score and effective sanitization of big data using Amazon elastic MapReduce

TitleM-sanit: Computing misusability score and effective sanitization of big data using Amazon elastic MapReduce
Publication TypeConference Paper
Year of Publication2017
AuthorsNagaratna, M., Sowmya, Y.
Conference Name2017 International Conference on Computation of Power, Energy Information and Commuincation (ICCPEIC)
KeywordsAmazon Elastic Cloud Compute, Amazon Elastic MapReduce, Big Data, big data processing, cloud computing, composability, data mining, data privacy, Data Sanitization, Distributed databases, EC2, EMR, Human Behavior, human factors, MapReduce, MapReduce programming paradigm, misusability measure, misusability score, outsourced data, parallel programming, privacy, privacy preserving data mining, privacy preserving data publishing, Programming, pubcrawl, Publishing, resilience, Resiliency, sanitization, sensitive data, voluminous data
AbstractThe invent of distributed programming frameworks like Hadoop paved way for processing voluminous data known as big data. Due to exponential growth of data, enterprises started to exploit the availability of cloud infrastructure for storing and processing big data. Insider attacks on outsourced data causes leakage of sensitive data. Therefore, it is essential to sanitize data so as to preserve privacy or non-disclosure of sensitive data. Privacy Preserving Data Publishing (PPDP) and Privacy Preserving Data Mining (PPDM) are the areas in which data sanitization plays a vital role in preserving privacy. The existing anonymization techniques for MapReduce programming can be improved to have a misusability measure for determining the level of sanitization to be applied to big data. To overcome this limitation we proposed a framework known as M-Sanit which has mechanisms to exploit misusability score of big data prior to performing sanitization using MapReduce programming paradigm. Our empirical study using the real world cloud eco system such as Amazon Elastic Cloud Compute (EC2) and Amazon Elastic MapReduce (EMR) reveals the effectiveness of misusability score based sanitization of big data prior to publishing or mining it.
DOI10.1109/ICCPEIC.2017.8290334
Citation Keynagaratna_m-sanit:_2017