Biblio
Hadoop is developed as a distributed data processing platform for analyzing big data. Enterprises can analyze big data containing users' sensitive information by using Hadoop and utilize them for their marketing. Therefore, researches on data encryption have been widely done to protect the leakage of sensitive data stored in Hadoop. However, the existing researches support only the AES international standard data encryption algorithm. Meanwhile, the Korean government selected ARIA algorithm as a standard data encryption scheme for domestic usages. In this paper, we propose a HDFS data encryption scheme which supports both ARIA and AES algorithms on Hadoop. First, the proposed scheme provides a HDFS block-splitting component that performs ARIA/AES encryption and decryption under the Hadoop distributed computing environment. Second, the proposed scheme provides a variable-length data processing component that can perform encryption and decryption by adding dummy data, in case when the last data block does not contains 128-bit data. Finally, we show from performance analysis that our proposed scheme is efficient for various applications, such as word counting, sorting, k-Means, and hierarchical clustering.
With the growth of cloud computing, database outsourcing has attracted much interests. Due to the serious privacy threats in cloud computing, databases needs to be encrypted before being outsourced to the cloud. Therefore, various Top-k query processing algorithms have been studied for encrypted databases. However, existing algorithms are either insecure or inefficient. Therefore, in this paper we propose an efficient and secure Top-k query processing algorithm. Our algorithm guarantees the confidentiality of both the data and a user query while hiding data access patterns. Our algorithm also enables the query issuer not to participate in the query processing. To achieve a high level of query processing efficiency, we use new secure protocols using Yao's garbled circuit and a data packing technique. A performance analysis shows that the proposed algorithm outperforms the existing works in terms of query processing costs.