Biblio

List
Filter

Found 478 results

Filters: Keyword is Big Data [Clear All Filters]

2022-07-15

Luo, Yun, Chen, Yuling, Li, Tao, Wang, Yilei, Yang, Yixian. 2021. Using information entropy to analyze secure multi-party computation protocol. 2021 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). :312—318.

Secure multi-party computation(SMPC) is an important research field in cryptography, secure multi-party computation has a wide range of applications in practice. Accordingly, information security issues have arisen. Aiming at security issues in Secure multi-party computation, we consider that semi-honest participants have malicious operations such as collusion in the process of information interaction, gaining an information advantage over honest parties through collusion which leads to deviations in the security of the protocol. To solve this problem, we combine information entropy to propose an n-round information exchange protocol, in which each participant broadcasts a relevant information value in each round without revealing additional information. Through the change of the uncertainty of the correct result value in each round of interactive information, each participant cannot determine the correct result value before the end of the protocol. Security analysis shows that our protocol guarantees the security of the output obtained by the participants after the completion of the protocol.

Tao, Jing, Chen, A, Liu, Kai, Chen, Kailiang, Li, Fengyuan, Fu, Peng. 2021. Recommendation Method of Honeynet Trapping Component Based on LSTM. 2021 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). :952—957.

With the advancement of network physical social system (npss), a large amount of data privacy has become the targets of hacker attacks. Due to the complex and changeable attack methods of hackers, network security threats are becoming increasingly severe. As an important type of active defense, honeypots use the npss as a carrier to ensure the security of npss. However, traditional honeynet structures are relatively fixed, and it is difficult to trap hackers in a targeted manner. To bridge this gap, this paper proposes a recommendation method for LSTM prediction trap components based on attention mechanism. Its characteristic lies in the ability to predict hackers' attack interest, which increases the active trapping ability of honeynets. The experimental results show that the proposed prediction method can quickly and effectively predict the attacking behavior of hackers and promptly provide the trapping components that hackers are interested in.

Giesser, Patrick, Stechschulte, Gabriel, Costa Vaz, Anna da, Kaufmann, Michael. 2021. Implementing Efficient and Scalable In-Database Linear Regression in SQL. 2021 IEEE International Conference on Big Data (Big Data). :5125—5132.

Relational database management systems not only support larger-than-memory data processing and very advanced query optimization, but also offer the benefits of data security, privacy, and consistency. When machine learning on large data sets is processed directly on an existing SQL database server, the data does not need to be exported and transferred to a separate big data processing platform. To achieve this, we implement a linear regression algorithm using SQL code generation such that the computation can be performed server-side and directly in the RDBMs. Our method and its implementation, programmed in Python, solves linear regression (LR) using the ordinary least squares (OLS) method directly in the RDBMS using SQL code generation, leaving most of the processing in the database. Only the matrix of the system of equations, whose size is equal to the number of variables squared, is transferred from the SQL server to the Python client to be solved for OLS regression. For evaluation purposes, our LR implementation was tested with artificially generated datasets and compared to an existing Python library (Scikit Learn). We found that our implementation consistently solves OLS regression faster than Scikit Learn for datasets with more than 10,000 input rows, and if the number of columns is less than 64. Moreover, under the same test conditions where the computation is larger than memory, our implementation showed a fast result, while Scikit returned an out-of-memory error. We conclude that SQL is a promising tool for in-database processing of large-volume, low-dimensional data sets with a particular class of machine learning algorithms, namely those that can be efficiently solved with map-reduce queries such as OLS regression.

Yuan, Rui, Wang, Xinna, Xu, Jiangmin, Meng, Shunmei. 2021. A Differential-Privacy-based hybrid collaborative recommendation method with factorization and regression. 2021 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). :389—396.

Recommender systems have been proved to be effective techniques to provide users with better experiences. However, when a recommender knows the user's preference characteristics or gets their sensitive information, then a series of privacy concerns are raised. A amount of solutions in the literature have been proposed to enhance privacy protection degree of recommender systems. Although the existing solutions have enhanced the protection, they led to a decrease in recommendation accuracy simultaneously. In this paper, we propose a security-aware hybrid recommendation method by combining the factorization and regression techniques. Specifically, the differential privacy mechanism is integrated into data pre-processing for data encryption. Firstly data are perturbed to satisfy differential privacy and transported to the recommender. Then the recommender calculates the aggregated data. However, applying differential privacy raises utility issues of low recommendation accuracy, meanwhile the use of a single model may cause overfitting. In order to tackle this challenge, we adopt a fusion prediction model by combining linear regression (LR) and matrix factorization (MF) for collaborative recommendation. With the MovieLens dataset, we evaluate the recommendation accuracy and regression of our recommender system and demonstrate that our system performs better than the existing recommender system under privacy requirement.

2022-07-05

Park, Ho-rim, Hwang, Kyu-hong, Ha, Young-guk. 2021. An Object Detection Model Robust to Out-of-Distribution Data. 2021 IEEE International Conference on Big Data and Smart Computing (BigComp). :275—278.

Most of the studies of the existing object detection models are studies to better detect the objects to be detected. The problem of false detection of objects that should not be detected is not considered. When an object detection model that does not take this problem into account is applied to an industrial field close to humans, false detection can lead to a dangerous situation that greatly interferes with human life. To solve this false detection problem, this paper proposes a method of fine-tuning the backbone neural network model of the object detection model using the Outlier Exposure method and applying the class-specific uncertainty constant to the confidence score to detect the object.

2022-06-15

Tatar, Ekin Ecem, Dener, Murat. 2021. Anomaly Detection on Bitcoin Values. 2021 6th International Conference on Computer Science and Engineering (UBMK). :249–253.

Bitcoin has received a lot of attention from investors, researchers, regulators, and the media. It is a known fact that the Bitcoin price usually fluctuates greatly. However, not enough scientific research has been done on these fluctuations. In this study, long short-term memory (LSTM) modeling from Recurrent Neural Networks, which is one of the deep learning methods, was applied on Bitcoin values. As a result of this application, anomaly detection was carried out in the values from the data set. With the LSTM network, a time-dependent representation of Bitcoin price can be captured, and anomalies can be selected. The factors that play a role in the formation of the model to be applied in the detection of anomalies with the experimental results were evaluated.

2022-06-14

Zuech, Richard, Hancock, John, Khoshgoftaar, Taghi M.. 2021. Feature Popularity Between Different Web Attacks with Supervised Feature Selection Rankers. 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA). :30–37.

We introduce the novel concept of feature popularity with three different web attacks and big data from the CSE-CIC-IDS2018 dataset: Brute Force, SQL Injection, and XSS web attacks. Feature popularity is based upon ensemble Feature Selection Techniques (FSTs) and allows us to more easily understand common important features between different cyberattacks, for two main reasons. First, feature popularity lists can be generated to provide an easy comprehension of important features across different attacks. Second, the Jaccard similarity metric can provide a quantitative score for how similar feature subsets are between different attacks. Both of these approaches not only provide more explainable and easier-to-understand models, but they can also reduce the complexity of implementing models in real-world systems. Four supervised learning-based FSTs are used to generate feature subsets for each of our three different web attack datasets, and then our feature popularity frameworks are applied. For these three web attacks, the XSS and SQL Injection feature subsets are the most similar per the Jaccard similarity. The most popular features across all three web attacks are: Flow\_Bytes\_s, FlowİAT\_Max, and Flow\_Packets\_s. While this introductory study is only a simple example using only three web attacks, this feature popularity concept can be easily extended, allowing an automated framework to more easily determine the most popular features across a very large number of attacks and features.

Kawanishi, Yasuyuki, Nishihara, Hideaki, Yoshida, Hirotaka, Hata, Yoichi. 2021. A Study of The Risk Quantification Method focusing on Direct-Access Attacks in Cyber-Physical Systems. 2021 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). :298–305.

Direct-access attacks were initially considered as un-realistic threats in cyber security because the attacker can more easily mount other non-computerized attacks like cutting a brake line. In recent years, some research into direct-access attacks have been conducted especially in the automotive field, for example, research on an attack method that makes the ECU stop functioning via the CAN bus. The problem with existing risk quantification methods is that direct-access attacks seem not to be recognized as serious threats. To solve this problem, we propose a new risk quantification method by applying vulnerability evaluation criteria and by setting metrics. We also confirm that direct-access attacks not recognized by conventional methods can be evaluated appropriately, using the case study of an automotive system as an example of a cyber-physical system.

Singh, A K, Goyal, Navneet. 2021. Detection of Malicious Webpages Using Deep Learning. 2021 IEEE International Conference on Big Data (Big Data). :3370–3379.

Malicious Webpages have been a serious threat on Internet for the past few years. As per the latest Google Transparency reports, they continue to be top ranked amongst online threats. Various techniques have been used till date to identify malicious sites, to include, Static Heuristics, Honey Clients, Machine Learning, etc. Recently, with the rapid rise of Deep Learning, an interest has aroused to explore Deep Learning techniques for detecting Malicious Webpages. In this paper Deep Learning has been utilized for such classification. The model proposed in this research has used a Deep Neural Network (DNN) with two hidden layers to distinguish between Malicious and Benign Webpages. This DNN model gave high accuracy of 99.81% with very low False Positives (FP) and False Negatives (FN), and with near real-time response on test sample. The model outperformed earlier machine learning solutions in accuracy, precision, recall and time performance metrics.

Schneider, Madeleine, Aspinall, David, Bastian, Nathaniel D.. 2021. Evaluating Model Robustness to Adversarial Samples in Network Intrusion Detection. 2021 IEEE International Conference on Big Data (Big Data). :3343–3352.

Adversarial machine learning, a technique which seeks to deceive machine learning (ML) models, threatens the utility and reliability of ML systems. This is particularly relevant in critical ML implementations such as those found in Network Intrusion Detection Systems (NIDS). This paper considers the impact of adversarial influence on NIDS and proposes ways to improve ML based systems. Specifically, we consider five feature robustness metrics to determine which features in a model are most vulnerable, and four defense methods. These methods are tested on six ML models with four adversarial sample generation techniques. Our results show that across different models and adversarial generation techniques, there is limited consistency in vulnerable features or in effectiveness of defense method.

Kim, Seongsoo, Chen, Lei, Kim, Jongyeop. 2021. Intrusion Prediction using Long Short-Term Memory Deep Learning with UNSW-NB15. 2021 IEEE/ACIS 6th International Conference on Big Data, Cloud Computing, and Data Science (BCD). :53–59.

This study shows the effectiveness of anomaly-based IDS using long short-term memory(LSTM) based on the newly developed dataset called UNSW-NB15 while considering root mean square error and mean absolute error as evaluation metrics for accuracy. For each attack, 80% and 90% of samples were used as LSTM inputs and trained this model while increasing epoch values. Furthermore, this model has predicted attack points by applying test data and produced possible attack points for each attack at the 3rd time frame against the actual attack point. However, in the case of an Exploit attack, the consecutive overlapping attacks happen, there was ambiguity in the interpretation of the numerical values calculated by the LSTM. We presented a methodology for training data with binary values using LSTM and evaluation with RMSE metrics throughout this study.

Hancock, John, Khoshgoftaar, Taghi M., Leevy, Joffrey L.. 2021. Detecting SSH and FTP Brute Force Attacks in Big Data. 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA). :760–765.

We present a simple approach for detecting brute force attacks in the CSE-CIC-IDS2018 Big Data dataset. We show our approach is preferable to more complex approaches since it is simpler, and yields stronger classification performance. Our contribution is to show that it is possible to train and test simple Decision Tree models with two independent variables to classify CSE-CIC-IDS2018 data with better results than reported in previous research, where more complex Deep Learning models are employed. Moreover, we show that Decision Tree models trained on data with two independent variables perform similarly to Decision Tree models trained on a larger number independent variables. Our experiments reveal that simple models, with AUC and AUPRC scores greater than 0.99, are capable of detecting brute force attacks in CSE-CIC-IDS2018. To the best of our knowledge, these are the strongest performance metrics published for the machine learning task of detecting these types of attacks. Furthermore, the simplicity of our approach, combined with its strong performance, makes it an appealing technique.

Yasa, Ray Novita, Buana, I Komang Setia, Girinoto, Setiawan, Hermawan, Hadiprakoso, Raden Budiarto. 2021. Modified RNP Privacy Protection Data Mining Method as Big Data Security. 2021 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS. :30–34.

Privacy-Preserving Data Mining (PPDM) has become an exciting topic to discuss in recent decades due to the growing interest in big data and data mining. A technique of securing data but still preserving the privacy that is in it. This paper provides an alternative perturbation-based PPDM technique which is carried out by modifying the RNP algorithm. The novelty given in this paper are modifications of some steps method with a specific purpose. The modifications made are in the form of first narrowing the selection of the disturbance value. With the aim that the number of attributes that are replaced in each record line is only as many as the attributes in the original data, no more and no need to repeat; secondly, derive the perturbation function from the cumulative distribution function and use it to find the probability distribution function so that the selection of replacement data has a clear basis. The experiment results on twenty-five perturbed data show that the modified RNP algorithm balances data utility and security level by selecting the appropriate disturbance value and perturbation value. The level of security is measured using privacy metrics in the form of value difference, average transformation of data, and percentage of retains. The method presented in this paper is fascinating to be applied to actual data that requires privacy preservation.

Qureshi, Hifza, Sagar, Anil Kumar, Astya, Rani, Shrivastava, Gulshan. 2021. Big Data Analytics for Smart Education. 2021 IEEE 6th International Conference on Computing, Communication and Automation (ICCCA). :650–658.

The existing education system, which incorporates school assessments, has some flaws. Conventional teaching methods give students no immediate feedback, also make teachers to spend hours grading repetitive assignments, and aren't very constructive in showing students how to improve in their academics, and also fail to take advantage of digital opportunities that can improve learning outcomes. In addition, since a single teacher has to manage a class of students, it gets difficult to focus on each and every student in the class. Furthermore, with the help of a management system for better learning, educational organizations can now implement administrative analytics and execute new business intelligence using big data. This data visualization aids in the evaluation of teaching, management, and study success metrics. In this paper, there is put forward a discussion on how Data Mining and Data Analytics can help make the experience of learning and teaching both, easier and accountable. There will also be discussion on how the education organization has undergone numerous challenges in terms of effective and efficient teachings, student-performance. In addition development, and inadequate data storage, processing, and analysis will also be discussed. The research implements Python programming language on big education data. In addition, the research adopted an exploratory research design to identify the complexities and requirements of big data in the education field.

2022-06-13

Santos, Nelson, Younis, Waleed, Ghita, Bogdan, Masala, Giovanni. 2021. Enhancing Medical Data Security on Public Cloud. 2021 IEEE International Conference on Cyber Security and Resilience (CSR). :103–108.

Cloud computing, supported by advancements in virtualisation and distributed computing, became the default options for implementing the IT infrastructure of organisations. Medical data and in particular medical images have increasing storage space and remote access requirements. Cloud computing satisfies these requirements but unclear safeguards on data security can expose sensitive data to possible attacks. Furthermore, recent changes in legislation imposed additional security constraints in technology to ensure the privacy of individuals and the integrity of data when stored in the cloud. In contrast with this trend, current data security methods, based on encryption, create an additional overhead to the performance, and often they are not allowed in public cloud servers. Hence, this paper proposes a mechanism that combines data fragmentation to protect medical images on the public cloud servers, and a NoSQL database to secure an efficient organisation of such data. Results of this paper indicate that the latency of the proposed method is significantly lower if compared with AES, one of the most adopted data encryption mechanisms. Therefore, the proposed method is an optimal trade-off in environments with low latency requirements or limited resources.

Priyanka, V S, Satheesh Kumar, S, Jinu Kumar, S V. 2021. A Forensic Methodology for the Analysis of Cloud-Based Android Apps. 2021 International Conference on Forensics, Analytics, Big Data, Security (FABS). 1:1–5.

The widespread use of smartphones has made the gadget a prime source of evidence for crime investigators. The cloud-based applications on mobile devices store a rich set of evidence in the cloud servers. The physical acquisition of Android devices reveals only minimal data of cloud-based apps. However, the artifacts collected from mobile devices can be used for data acquisition from cloud servers. This paper focuses on the forensic acquisition and analysis of cloud data of Google apps on Android devices. The proposed methodology uses the tokens extracted from the Android devices to get authenticated to the Google server bypassing the two-factor authentication scheme and access the cloud data for further analysis. Based on the investigation, we have also developed a tool to acquire, preserve and analyze cloud data in a forensically sound manner.

Gupta, B. B., Gaurav, Akshat, Peraković, Dragan. 2021. A Big Data and Deep Learning based Approach for DDoS Detection in Cloud Computing Environment. 2021 IEEE 10th Global Conference on Consumer Electronics (GCCE). :287–290.

Recently, as a result of the COVID-19 pandemic, the internet service has seen an upsurge in use. As a result, the usage of cloud computing apps, which offer services to end users on a subscription basis, rises in this situation. However, the availability and efficiency of cloud computing resources are impacted by DDoS attacks, which are designed to disrupt the availability and processing power of cloud computing services. Because there is no effective way for detecting or filtering DDoS attacks, they are a dependable weapon for cyber-attackers. Recently, researchers have been experimenting with machine learning (ML) methods in order to create efficient machine learning-based strategies for detecting DDoS assaults. In this context, we propose a technique for detecting DDoS attacks in a cloud computing environment using big data and deep learning algorithms. The proposed technique utilises big data spark technology to analyse a large number of incoming packets and a deep learning machine learning algorithm to filter malicious packets. The KDDCUP99 dataset was used for training and testing, and an accuracy of 99.73% was achieved.

Syed, Saba, Anu, Vaibhav. 2021. Digital Evidence Data Collection: Cloud Challenges. 2021 IEEE International Conference on Big Data (Big Data). :6032–6034.

Cloud computing has become ubiquitous in the modern world and has offered a number of promising and transformative technological opportunities. However, organizations that use cloud platforms are also concerned about cloud security and new threats that arise due to cloud adoption. Digital forensic investigations (DFI) are undertaken when a security incident (i.e., successful attack) has been identified. Forensics data collection is an integral part of DFIs. This paper presents results from a survey of existing literature on challenges related to forensics data collection in cloud. A taxonomy of major challenges was developed to help organizations understand and thus better prepare for forensics data collection.

Fan, Teah Yi, Rana, Muhammad Ehsan. 2021. Facilitating Role of Cloud Computing in Driving Big Data Emergence. 2021 Third International Sustainability and Resilience Conference: Climate Change. :524–529.

Big data emerges as an important technology that addresses the storage, processing and analytics aspects of massive data characterized by 5V's (volume, velocity, variety, veracity, value) which has grown exponentially beyond the handling capacity traditional data architectures. The most significant technologies include the parallel storage and processing framework which requires entirely new IT infrastructures to facilitate big data adoption. Cloud computing emerges as a successful paradigm in computing technology that shifted the business landscape of IT infrastructures towards service-oriented basis. Cloud service providers build IT infrastructures and technologies and offer them as services which can be accessed through internet to the consumers. This paper discusses on the facilitating role of cloud computing in the field of big data analytics. Cloud deployment models concerning the architectural aspect and the current trend of adoption are introduced. The fundamental cloud services models concerning the infrastructural and technological provisioning are introduced while the emerging cloud services models related to big data are discussed with examples of technology platforms offered by the big cloud service providers - Amazon, Google, Microsoft and Cloudera. The main advantages of cloud adoption in terms of availability and scalability for big data are reiterated. Lastly, the challenges concerning cloud security, data privacy and data governance of consuming and adopting big data in the cloud are highlighted.

Stauffer, Jake, Zhang, Qingxue. 2021. s2Cloud: A Novel Cloud System for Mobile Health Big Data Management. 2021 IEEE International Conferences on Internet of Things (iThings) and IEEE Green Computing & Communications (GreenCom) and IEEE Cyber, Physical & Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics). :380–383.

The era of big data continues to progress, and many new practices and applications are being advanced. One such application is big data in healthcare. In this application, big data, which includes patient information and measurements, must be transmitted and managed in smart and secure ways. In this study, we propose a novel big data cloud system, s2Cloud, standing for Smart and Secure Cloud. s2Cloud can enable health care systems to improve patient monitoring and help doctors gain crucial insights into their patients' health. This system provides an interactive website that allows doctors to effectively manage patients and patient records. Furthermore, both real-time and historical functions for big data management are supported. These functions provide visualizations of patient measurements and also allow for historic data retrieval so further analysis can be conducted. The security is achieved by protecting access and transmission of data via sign up and log in portals. Overall, the proposed s2Cloud system can effectively manage healthcare big data applications. This study will also help to advance other big data applications such as smart home and smart world big data practices.

Wang, Fengling, Wang, Han, Xue, Liang. 2021. Research on Data Security in Big Data Cloud Computing Environment. 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC). 5:1446–1450.

In the big data cloud computing environment, data security issues have become a focus of attention. This paper delivers an overview of conceptions, characteristics and advanced technologies for big data cloud computing. Security issues of data quality and privacy control are elaborated pertaining to data access, data isolation, data integrity, data destruction, data transmission and data sharing. Eventually, a virtualization architecture and related strategies are proposed to against threats and enhance the data security in big data cloud environment.

2022-06-10

Ge, Yurun, Bertozzi, Andrea L.. 2021. Active Learning for the Subgraph Matching Problem. 2021 IEEE International Conference on Big Data (Big Data). :2641–2649.

The subgraph matching problem arises in a number of modern machine learning applications including segmented images and meshes of 3D objects for pattern recognition, bio-chemical reactions and security applications. This graph-based problem can have a very large and complex solution space especially when the world graph has many more nodes and edges than the template. In a real use-case scenario, analysts may need to query additional information about template nodes or world nodes to reduce the problem size and the solution space. Currently, this query process is done by hand, based on the personal experience of analysts. By analogy to the well-known active learning problem in machine learning classification problems, we present a machine-based active learning problem for the subgraph match problem in which the machine suggests optimal template target nodes that would be most likely to reduce the solution space when it is otherwise overly large and complex. The humans in the loop can then include additional information about those target nodes. We present some case studies for both synthetic and real world datasets for multichannel subgraph matching.

Poon, Lex, Farshidi, Siamak, Li, Na, Zhao, Zhiming. 2021. Unsupervised Anomaly Detection in Data Quality Control. 2021 IEEE International Conference on Big Data (Big Data). :2327–2336.

Data is one of the most valuable assets of an organization and has a tremendous impact on its long-term success and decision-making processes. Typically, organizational data error and outlier detection processes perform manually and reactively, making them time-consuming and prone to human errors. Additionally, rich data types, unlabeled data, and increased volume have made such data more complex. Accordingly, an automated anomaly detection approach is required to improve data management and quality control processes. This study introduces an unsupervised anomaly detection approach based on models comparison, consensus learning, and a combination of rules of thumb with iterative hyper-parameter tuning to increase data quality. Furthermore, a domain expert is considered a human in the loop to evaluate and check the data quality and to judge the output of the unsupervised model. An experiment has been conducted to assess the proposed approach in the context of a case study. The experiment results confirm that the proposed approach can improve the quality of organizational data and facilitate anomaly detection processes.

2022-06-09

Shoba, V., Parameswari, R.. 2021. Data Security and Privacy Preserving with Augmented Homomorphic Re-Encryption Decryption (AHRED) Algorithm in Big Data Analytics. 2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA). :451–457.

The process of Big data storage has become challenging due to the expansion of extensive data; data providers will offer encrypted data and upload to Big data. However, the data exchange mechanism is unable to accommodate encrypted data. Particularly when a large number of users share the scalable data, the scalability becomes extremely limited. Using a contemporary privacy protection system to solve this issue and ensure the security of encrypted data, as well as partially homomorphic re-encryption and decryption (PHRED). This scheme has the flexibility to share data by ensuring user's privacy with partially trusted Big Data. It can access to strong unforgeable scheme it make the transmuted cipher text have public and private key verification combined identity based Augmented Homomorphic Re Encryption Decryption(AHRED) on paillier crypto System with Laplacian noise filter the performance of the data provider for privacy preserving big data.

Chandrakar, Ila, Hulipalled, Vishwanath R. 2021. Privacy Preserving Big Data mining using Pseudonymization and Homomorphic Encryption. 2021 2nd Global Conference for Advancement in Technology (GCAT). :1–4.

Today’s data is so huge so it’s referred to as “Big data.” Such data now exceeds petabytes, and hence businesses have begun to store it in the cloud. Because the cloud is a third party, data must be secured before being uploaded to the cloud in such a way that cloud mining may be performed on protected data, as desired by the organization. Homomorphic encryption permits mining and analysis of encrypted data, hence it is used in the proposed work to encrypt original data on the data owner’s site. Since, homomorphic encryption is a complicated encryption, it takes a long time to encrypt, causing performance to suffer. So, in this paper, we used Hadoop to implement homomorphic encryption, which splits data across nodes in a Hadoop cluster to execute parallel algorithm and provides greater privacy and performance than previous approaches. It also enables for data mining in encrypted form, ensuring that the cloud never sees the original data during mining.