Title | M-sanit: Computing misusability score and effective sanitization of big data using Amazon elastic MapReduce |
Publication Type | Conference Paper |
Year of Publication | 2017 |
Authors | Nagaratna, M., Sowmya, Y. |
Conference Name | 2017 International Conference on Computation of Power, Energy Information and Commuincation (ICCPEIC) |
Keywords | Amazon Elastic Cloud Compute, Amazon Elastic MapReduce, Big Data, big data processing, cloud computing, composability, data mining, data privacy, Data Sanitization, Distributed databases, EC2, EMR, Human Behavior, human factors, MapReduce, MapReduce programming paradigm, misusability measure, misusability score, outsourced data, parallel programming, privacy, privacy preserving data mining, privacy preserving data publishing, Programming, pubcrawl, Publishing, resilience, Resiliency, sanitization, sensitive data, voluminous data |
Abstract | The invent of distributed programming frameworks like Hadoop paved way for processing voluminous data known as big data. Due to exponential growth of data, enterprises started to exploit the availability of cloud infrastructure for storing and processing big data. Insider attacks on outsourced data causes leakage of sensitive data. Therefore, it is essential to sanitize data so as to preserve privacy or non-disclosure of sensitive data. Privacy Preserving Data Publishing (PPDP) and Privacy Preserving Data Mining (PPDM) are the areas in which data sanitization plays a vital role in preserving privacy. The existing anonymization techniques for MapReduce programming can be improved to have a misusability measure for determining the level of sanitization to be applied to big data. To overcome this limitation we proposed a framework known as M-Sanit which has mechanisms to exploit misusability score of big data prior to performing sanitization using MapReduce programming paradigm. Our empirical study using the real world cloud eco system such as Amazon Elastic Cloud Compute (EC2) and Amazon Elastic MapReduce (EMR) reveals the effectiveness of misusability score based sanitization of big data prior to publishing or mining it. |
DOI | 10.1109/ICCPEIC.2017.8290334 |
Citation Key | nagaratna_m-sanit:_2017 |