Biblio

List
Filter

Found 478 results

Filters: Keyword is Big Data [Clear All Filters]

2022-04-13

Chen, Hao, Chen, Lin, Kuang, Xiaoyun, Xu, Aidong, Yang, Yiwei. 2021. Support Forward Secure Smart Grid Data Deduplication and Deletion Mechanism. 2021 2nd Asia Symposium on Signal Processing (ASSP). :67–76.

With the vigorous development of the Internet and the widespread popularity of smart devices, the amount of data it generates has also increased exponentially, which has also promoted the generation and development of cloud computing and big data. Given cloud computing and big data technology, cloud storage has become a good solution for people to store and manage data at this stage. However, when cloud storage manages and regulates massive amounts of data, its security issues have become increasingly prominent. Aiming at a series of security problems caused by a malicious user's illegal operation of cloud storage and the loss of all data, this paper proposes a threshold signature scheme that is signed by a private key composed of multiple users. When this method performs key operations of cloud storage, multiple people are required to sign, which effectively prevents a small number of malicious users from violating data operations. At the same time, the threshold signature method in this paper uses a double update factor algorithm. Even if the attacker obtains the key information at this stage, he can not calculate the complete key information before and after the time period, thus having the two-way security and greatly improving the security of the data in the cloud storage.

Solanke, Abiodun A., Chen, Xihui, Ramírez-Cruz, Yunior. 2021. Pattern Recognition and Reconstruction: Detecting Malicious Deletions in Textual Communications. 2021 IEEE International Conference on Big Data (Big Data). :2574–2582.

Digital forensic artifacts aim to provide evidence from digital sources for attributing blame to suspects, assessing their intents, corroborating their statements or alibis, etc. Textual data is a significant source of artifacts, which can take various forms, for instance in the form of communications. E-mails, memos, tweets, and text messages are all examples of textual communications. Complex statistical, linguistic and other scientific procedures can be manually applied to this data to uncover significant clues that point the way to factual information. While expert investigators can undertake this task, there is a possibility that critical information is missed or overlooked. The primary objective of this work is to aid investigators by partially automating the detection of suspicious e-mail deletions. Our approach consists in building a dynamic graph to represent the temporal evolution of communications, and then using a Variational Graph Autoencoder to detect possible e-mail deletions in this graph. Our model uses multiple types of features for representing node and edge attributes, some of which are based on metadata of the messages and the rest are extracted from the contents using natural language processing and text mining techniques. We use the autoencoder to detect missing edges, which we interpret as potential deletions; and to reconstruct their features, from which we emit hypotheses about the topics of deleted messages. We conducted an empirical evaluation of our model on the Enron e-mail dataset, which shows that our model is able to accurately detect a significant proportion of missing communications and to reconstruct the corresponding topic vectors.

2022-04-12

Guo, Yifan, Wang, Qianlong, Ji, Tianxi, Wang, Xufei, Li, Pan. 2021. Resisting Distributed Backdoor Attacks in Federated Learning: A Dynamic Norm Clipping Approach. 2021 IEEE International Conference on Big Data (Big Data). :1172—1182.

With the advance in artificial intelligence and high-dimensional data analysis, federated learning (FL) has emerged to allow distributed data providers to collaboratively learn without direct access to local sensitive data. However, limiting access to individual provider’s data inevitably incurs security issues. For instance, backdoor attacks, one of the most popular data poisoning attacks in FL, severely threaten the integrity and utility of the FL system. In particular, backdoor attacks launched by multiple collusive attackers, i.e., distributed backdoor attacks, can achieve high attack success rates and are hard to detect. Existing defensive approaches, like model inspection or model sanitization, often require to access a portion of local training data, which renders them inapplicable to the FL scenarios. Recently, the norm clipping approach is developed to effectively defend against distributed backdoor attacks in FL, which does not rely on local training data. However, we discover that adversaries can still bypass this defense scheme through robust training due to its unchanged norm clipping threshold. In this paper, we propose a novel defense scheme to resist distributed backdoor attacks in FL. Particularly, we first identify that the main reason for the failure of the norm clipping scheme is its fixed threshold in the training process, which cannot capture the dynamic nature of benign local updates during the global model’s convergence. Motivated by it, we devise a novel defense mechanism to dynamically adjust the norm clipping threshold of local updates. Moreover, we provide the convergence analysis of our defense scheme. By evaluating it on four non-IID public datasets, we observe that our defense scheme effectively can resist distributed backdoor attacks and ensure the global model’s convergence. Noticeably, our scheme reduces the attack success rates by 84.23% on average compared with existing defense schemes.

Shams, Montasir, Pavia, Sophie, Khan, Rituparna, Pyayt, Anna, Gubanov, Michael. 2021. Towards Unveiling Dark Web Structured Data. 2021 IEEE International Conference on Big Data (Big Data). :5275—5282.

Anecdotal evidence suggests that Web-search engines, together with the Knowledge Graphs and Bases, such as YAGO [46], DBPedia [13], Freebase [16], Google Knowledge Graph [52] provide rapid access to most structured information on the Web. However, taking a closer look reveals a so called "knowledge gap" [18] that is largely in the dark. For example, a person searching for a relevant job opening has to spend at least 3 hours per week for several months [2] just searching job postings on numerous online job-search engines and the employer websites. The reason why this seemingly simple task cannot be completed by typing in a few keyword queries into a search-engine and getting all relevant results in seconds instead of hours is because access to structured data on the Web is still rudimentary. While searching for a job we have many parameters in mind, not just the job title, but also, usually location, salary range, remote work option, given a recent shift to hybrid work places, and many others. Ideally, we would like to write a SQL-style query, selecting all job postings satisfying our requirements, but it is currently impossible, because job postings (and all other) Web tables are structured in many different ways and scattered all over the Web. There is neither a Web-scale generalizable algorithm nor a system to locate and normalize all relevant tables in a category of interest from millions of sources.Here we describe and evaluate on a corpus having hundreds of millions of Web tables [39], a new scalable iterative training data generation algorithm, producing high quality training data required to train Deep- and Machine-learning models, capable of generalizing to Web scale. The models, trained on such en-riched training data efficiently deal with Web scale heterogeneity compared to poor generalization performance of models, trained without enrichment [20], [25], [38]. Such models are instrumental in bridging the knowledge gap for structured data on the Web.

Nair, Viswajit Vinod, van Staalduinen, Mark, Oosterman, Dion T.. 2021. Template Clustering for the Foundational Analysis of the Dark Web. 2021 IEEE International Conference on Big Data (Big Data). :2542—2549.

The rapid rise of the Dark Web and supportive technologies has served as the backbone facilitating online illegal activity worldwide. These illegal activities supported by anonymisation technologies such as Tor has made it increasingly elusive to law enforcement agencies. Despite several successful law enforcement operations, illegal activity on the Dark Web is still growing. There are approaches to monitor, mine, and research the Dark Web, all with varying degrees of success. Given the complexity and dynamics of the services offered, we recognize the need for in depth analysis of the Dark Web with regard to its infrastructures, actors, types of abuse and their relationships. This involves the challenging task of information extraction from the very heterogeneous collection of web pages that make up the Dark Web. Most providers develop their services on top of standard frameworks such as WordPress, Simple Machine Forum, phpBB and several other frameworks to deploy their services. As a result, these service providers publish significant number of pages based on similar structural and stylistic templates. We propose an efficient, scalable, repeatable and accurate approach to cluster Dark Web pages based on those structural and stylistic features. Extracting relevant information from those clusters should make it feasible to conduct in depth Dark Web analysis. This paper presents our clustering algorithm to accelerate information extraction, and as a result improve attribution of digital traces to infrastructures or individuals in the fight against cyber crime.

2022-04-01

Dinh, Phuc Trinh, Park, Minho. 2021. BDF-SDN: A Big Data Framework for DDoS Attack Detection in Large-Scale SDN-Based Cloud. 2021 IEEE Conference on Dependable and Secure Computing (DSC). :1–8.

Software-defined networking (SDN) nowadays is extensively being used in a variety of practical settings, provides a new way to manage networks by separating the data plane from its control plane. However, SDN is particularly vulnerable to Distributed Denial of Service (DDoS) attacks because of its centralized control logic. Many studies have been proposed to tackle DDoS attacks in an SDN design using machine-learning-based schemes; however, these feature-based detection schemes are highly resource-intensive and they are unable to perform reliably in such a large-scale SDN network where a massive amount of traffic data is generated from both control and data planes. This can deplete computing resources, degrade network performance, or even shut down the network systems owing to being exhausting resources. To address the above challenges, this paper proposes a big data framework to overcome traditional data processing limitations and to exploit distributed resources effectively for the most compute-intensive tasks such as DDoS attack detection using machine learning techniques, etc. We demonstrate the robustness, scalability, and effectiveness of our framework through practical experiments.

2022-03-25

Li, Xin, Yi, Peng, Jiang, Yiming, Lu, Xiangyu. 2021. Traffic Anomaly Detection Algorithm Based on Improved Salp Swarm Optimal Density Peak Clustering. 2021 4th International Conference on Artificial Intelligence and Big Data (ICAIBD). :187—191.

Aiming at the problems of low accuracy and poor effect caused by the lack of data labels in most real network traffic, an optimized density peak clustering based on the improved salp swarm algorithm is proposed for traffic anomaly detection. Through the optimization of cosine decline and chaos strategy, the salp swarm algorithm not only accelerates the convergence speed, but also enhances the search ability. Moreover, we use the improved salp swarm algorithm to adaptively search the best truncation distance of density peak clustering, which avoids the subjectivity and uncertainty of manually selecting the parameters. The experimental results based on NSL-KDD dataset show that the improved salp swarm algorithm achieves faster convergence speed and higher precision, increases the average anomaly detection accuracy of 4.74% and detection rate of 6.14%, and reduces the average false positive rate of 7.38%.

2022-03-10

Pölöskei, István. 2021. Continuous natural language processing pipeline strategy. 2021 IEEE 15th International Symposium on Applied Computational Intelligence and Informatics (SACI). :000221—000224.

Natural language processing (NLP) is a division of artificial intelligence. The constructed model's quality is entirely reliant on the training dataset's quality. A data streaming pipeline is an adhesive application, completing a managed connection from data sources to machine learning methods. The recommended NLP pipeline composition has well-defined procedures. The implemented message broker design is a usual apparatus for delivering events. It makes it achievable to construct a robust training dataset for machine learning use-case and serve the model's input. The reconstructed dataset is a valid input for the machine learning processes. Based on the data pipeline's product, the model recreation and redeployment can be scheduled automatically.

2022-03-08

Yang, Cuicui, Liu, Pinjie. 2021. Big Data Nearest Neighbor Similar Data Retrieval Algorithm based on Improved Random Forest. 2021 International Conference on Big Data Analysis and Computer Science (BDACS). :175—178.

In the process of big data nearest neighbor similar data retrieval, affected by the way of data feature extraction, the retrieval accuracy is low. Therefore, this paper proposes the design of big data nearest neighbor similar data retrieval algorithm based on improved random forest. Through the improvement of random forest model and the construction of random decision tree, the characteristics of current nearest neighbor big data are clarified. Based on the improved random forest, the hash code is generated. Finally, combined with the Hamming distance calculation method, the nearest neighbor similar data retrieval of big data is realized. The experimental results show that: in the multi label environment, the retrieval accuracy is improved by 9% and 10%. In the single label environment, the similar data retrieval accuracy of the algorithm is improved by 12% and 28% respectively.

Ma, Xiaoyu, Yang, Tao, Chen, Jiangchuan, Liu, Ziyu. 2021. k-Nearest Neighbor algorithm based on feature subspace. 2021 International Conference on Big Data Analysis and Computer Science (BDACS). :225—228.

The traditional KNN algorithm takes insufficient consideration of the spatial distribution of training samples, which leads to low accuracy in processing high-dimensional data sets. Moreover, the generation of k nearest neighbors requires all known samples to participate in the distance calculation, resulting in high time overhead. To solve these problems, a feature subspace based KNN algorithm (Feature Subspace KNN, FSS-KNN) is proposed in this paper. First, the FSS-KNN algorithm solves all the feature subspaces according to the distribution of the training samples in the feature space, so as to ensure that the samples in the same subspace have higher similarity. Second, the corresponding feature subspace is matched for the test set samples. On this basis, the search of k nearest neighbors is carried out in the corresponding subspace first, thus improving the accuracy and efficiency of the algorithm. Experimental results show that compared with the traditional KNN algorithm, FSS-KNN algorithm improves the accuracy and efficiency on Kaggle data set and UCI data set. Compared with the other four classical machine learning algorithms, FSS-KNN algorithm can significantly improve the accuracy.

Ramadhan, Hani, Kwon, Joonho. 2021. Enhancing Learned Index for A Higher Recall Trajectory K-Nearest Neighbor Search. 2021 IEEE International Conference on Big Data (Big Data). :6006—6007.

Learned indices can significantly shorten the query response time of k-Nearest Neighbor search of points data. However, extending the learned index for k-Nearest Neighbor search of trajectory data may return incorrect results (low recall) and require longer pruning time. Thus, we introduce an enhancement for trajectory learned index which is a pruning step for a learned index to retrieve the k-Nearest Neighbors correctly by learning the query workload. The pruning utilizes a predicted range query that covers the correct neighbors. We show that that our approach has the potential to work effectively in a large real-world trajectory dataset.

Jia, Yunsong. 2021. Design of nearest neighbor search for dynamic interaction points. 2021 2nd International Conference on Big Data and Informatization Education (ICBDIE). :389—393.

This article describes the definition, theoretical derivation, design ideas, and specific implementation of the nearest query algorithm for the acceleration of probabilistic optimization at first, and secondly gives an optimization conclusion that is generally applicable to high-dimensional Minkowski spaces with even-numbered feature parameters. Thirdly the operating efficiency and space sensitivity of this algorithm and the commonly used algorithms are compared from both theoretical and experimental aspects. Finally, the optimization direction is analyzed based on the results.

2022-03-01

Zhou, Jingwei. 2021. Construction of Computer Network Security Defense System Based On Big Data. 2021 International Conference on Big Data Analysis and Computer Science (BDACS). :5–8.

The development and popularization of big data technology bring more convenience to users, it also bring a series of computer network security problems. Therefore, this paper will briefly analyze the network security threats faced by users under the background of big data, and then combine the application function of computer network security defense system based on big data to propose an architecture design of computer network security defense system based on big data.

Li, Xiaojian, Chen, Jing, Jiang, Yiyi, Hu, Hangping, Yang, Haopeng. 2021. An Accountability-Oriented Generation approach to Time-Varying Structure of Cloud Service. 2021 IEEE International Conference on Services Computing (SCC). :413–418.

In the current cloud service development, during the widely used of cloud service, it can self organize and respond on demand when the cloud service in phenomenon of failure or violation, but it may still cause violation. The first step in forecasting or accountability for this situation, is to generate a dynamic structure of cloud services in a timely manner. In this research, it has presented a method to generate the time-varying structure of cloud service. Firstly, dependencies between tasks and even instances within a job of cloud service are visualized to explore the time-varying characteristics contained in the cloud service structure. And then, those dependencies are discovered quantitatively using CNN (Convolutional Neural Networks). Finally, it structured into an event network of cloud service for tracing violation and other usages. A validation to this approach has been examined by an experiment based on Alibaba’s dataset. A function integrity of this approach may up to 0.80, which is higher than Bai Y and others which is no more than 0.60.

Leevy, Joffrey L., Hancock, John, Khoshgoftaar, Taghi M., Seliya, Naeem. 2021. IoT Reconnaissance Attack Classification with Random Undersampling and Ensemble Feature Selection. 2021 IEEE 7th International Conference on Collaboration and Internet Computing (CIC). :41–49.

The exponential increase in the use of Internet of Things (IoT) devices has been accompanied by a spike in cyberattacks on IoT networks. In this research, we investigate the Bot-IoT dataset with a focus on classifying IoT reconnaissance attacks. Reconnaissance attacks are a foundational step in the cyberattack lifecycle. Our contribution is centered on the building of predictive models with the aid of Random Undersampling (RUS) and ensemble Feature Selection Techniques (FSTs). As far as we are aware, this type of experimentation has never been performed for the Reconnaissance attack category of Bot-IoT. Our work uses the Area Under the Receiver Operating Characteristic Curve (AUC) metric to quantify the performance of a diverse range of classifiers: Light GBM, CatBoost, XGBoost, Random Forest (RF), Logistic Regression (LR), Naive Bayes (NB), Decision Tree (DT), and a Multilayer Perceptron (MLP). For this study, we determined that the best learners are DT and DT-based ensemble classifiers, the best RUS ratio is 1:1 or 1:3, and the best ensemble FST is our ``6 Agree'' technique.

2022-02-25

Baofu, Han, Hui, Li, Chuansi, Wei. 2021. Blockchain-Based Distributed Data Integrity Auditing Scheme. 2021 IEEE 6th International Conference on Big Data Analytics (ICBDA). :143–149.

Cloud storage technology enables users to outsource local data to cloud service provider (CSP). In spite of its copious advantages, how to ensure the integrity of data has always been a significant issue. A variety of provable data possession (PDP) scheme have been proposed for cloud storage scenarios. However, the participation of centralized trusted third-party auditor (TPA) in most of the previous work has brought new security risks, because the TPA is prone to the single point of failure. Furthermore, the existing schemes do not consider the fair arbitration and lack an effective method to punish the malicious behavior. To address the above challenges, we propose a novel blockchain-based decentralized data integrity auditing scheme without the need for a centralized TPA. By using smart contract technique, our scheme supports automatic compensation mechanism. DO and CSP must first pay a certain amount of ether for the smart contract as deposit. The CSP gets the corresponding storage fee if the integrity auditing is passed. Otherwise, the CSP not only gets no fee but has to compensate DO whose data integrity is destroyed. Security analysis shows that the proposed scheme can resist a variety of attacks. Also, we implement our scheme on the platform of Ethereum to demonstrate the efficiency and effectiveness of our scheme.

2022-02-07

Keyes, David Sean, Li, Beiqi, Kaur, Gurdip, Lashkari, Arash Habibi, Gagnon, Francois, Massicotte, Frédéric. 2021. EntropLyzer: Android Malware Classification and Characterization Using Entropy Analysis of Dynamic Characteristics. 2021 Reconciling Data Analytics, Automation, Privacy, and Security: A Big Data Challenge (RDAAPS). :1–12.

The unmatched threat of Android malware has tremendously increased the need for analyzing prominent malware samples. There are remarkable efforts in static and dynamic malware analysis using static features and API calls respectively. Nonetheless, there is a void to classify Android malware by analyzing its behavior using multiple dynamic characteristics. This paper proposes EntropLyzer, an entropy-based behavioral analysis technique for classifying the behavior of 12 eminent Android malware categories and 147 malware families taken from CCCS-CIC-AndMal2020 dataset. This work uses six classes of dynamic characteristics including memory, API, network, logcat, battery, and process to classify and characterize Android malware. Results reveal that the entropy-based analysis successfully determines the behavior of all malware categories and most of the malware families before and after rebooting the emulator.

Liu, Jin-zhou. 2021. Research on Network Big Data Security Integration Algorithm Based on Machine Learning. 2021 International Conference of Social Computing and Digital Economy (ICSCDE). :264–267.

In order to improve the big data management ability of IOT access control based on converged network structure, a security integration model of IOT access control based on machine learning and converged network structure is proposed. Combined with the feature analysis method, the storage structure allocation model is established, the feature extraction and fuzzy clustering analysis of big data are realized by using the spatial node rotation control, the fuzzy information fusion parameter analysis model is constructed, the frequency coupling parameter analysis is realized, the virtual inertia parameter analysis model is established, and the integrated processing of big data is realized according to the machine learning analysis results. The test results show that the method has good clustering effect, reduces the storage overhead, and improves the reliability management ability of big data.

2022-01-31

Dai, Wei, Berleant, Daniel. 2021. Benchmarking Robustness of Deep Learning Classifiers Using Two-Factor Perturbation. 2021 IEEE International Conference on Big Data (Big Data). :5085–5094.

Deep learning (DL) classifiers are often unstable in that they may change significantly when retested on perturbed images or low quality images. This paper adds to the fundamental body of work on the robustness of DL classifiers. We introduce a new two-dimensional benchmarking matrix to evaluate robustness of DL classifiers, and we also innovate a four-quadrant statistical visualization tool, including minimum accuracy, maximum accuracy, mean accuracy, and coefficient of variation, for benchmarking robustness of DL classifiers. To measure robust DL classifiers, we create comprehensive 69 benchmarking image sets, including a clean set, sets with single factor perturbations, and sets with two-factor perturbation conditions. After collecting experimental results, we first report that using two-factor perturbed images improves both robustness and accuracy of DL classifiers. The two-factor perturbation includes (1) two digital perturbations (salt & pepper noise and Gaussian noise) applied in both sequences, and (2) one digital perturbation (salt & pepper noise) and a geometric perturbation (rotation) applied in both sequences. All source codes, related image sets, and results are shared on the GitHub website at https://github.com/caperock/robustai to support future academic research and industry projects.

Troyer, Dane, Henry, Justin, Maleki, Hoda, Dorai, Gokila, Sumner, Bethany, Agrawal, Gagan, Ingram, Jon. 2021. Privacy-Preserving Framework to Facilitate Shared Data Access for Wearable Devices. 2021 IEEE International Conference on Big Data (Big Data). :2583—2592.

Wearable devices are emerging as effective modalities for the collection of individuals’ data. While this data can be leveraged for use in several areas ranging from health-care to crime investigation, storing and securely accessing such information while preserving privacy and detecting any tampering attempts are significant challenges. This paper describes a decentralized system that ensures an individual’s privacy, maintains an immutable log of any data access, and provides decentralized access control management. Our proposed framework uses a custom permissioned blockchain protocol to securely log data transactions from wearable devices in the blockchain ledger. We have implemented a proof-of-concept for our framework, and our preliminary evaluation is summarized to demonstrate our proposed framework’s capabilities. We have also discussed various application scenarios of our privacy-preserving model using blockchain and proof-of-authority. Our research aims to detect data tampering attempts in data sharing scenarios using a thorough transaction log model.

2022-01-10

Ren, Sothearin, Kim, Jae-Sung, Cho, Wan-Sup, Soeng, Saravit, Kong, Sovanreach, Lee, Kyung-Hee. 2021. Big Data Platform for Intelligence Industrial IoT Sensor Monitoring System Based on Edge Computing and AI. 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC). :480–482.

The cutting edge of Industry 4.0 has driven everything to be converted to disruptive innovation and digitalized. This digital revolution is imprinted by modern and advanced technology that takes advantage of Big Data and Artificial Intelligence (AI) to nurture from automatic learning systems, smart city, smart energy, smart factory to the edge computing technology, and so on. To harness an appealing, noteworthy, and leading development in smart manufacturing industry, the modern industrial sciences and technologies such as Big Data, Artificial Intelligence, Internet of things, and Edge Computing have to be integrated cooperatively. Accordingly, a suggestion on the integration is presented in this paper. This proposed paper describes the design and implementation of big data platform for intelligence industrial internet of things sensor monitoring system and conveys a prediction of any upcoming errors beforehand. The architecture design is based on edge computing and artificial intelligence. To extend more precisely, industrial internet of things sensor here is about the condition monitoring sensor data - vibration, temperature, related humidity, and barometric pressure inside facility manufacturing factory.

Xu, Ling. 2021. Application of Artificial Intelligence and Big Data in the Security of Regulatory Places. 2021 International Conference on Artificial Intelligence and Electromechanical Automation (AIEA). :210–213.

This paper analyzes the necessity of artificial intelligence and big data in the security application of regulatory places. The author studies the specific application of artificial intelligence and big data in ideological dynamics management, access control system, video surveillance system, emergency alarm system, perimeter control system, police inspection system, daily behavior management, and system implementation management. The author puts forward how to do technical integration, improve information sharing, strengthen the construction of talents, and increase management fund expenditure. The purpose of this paper is to enhance the security management level of regulatory places and optimize the management environment of regulatory places.

Khashan, Osama A.. 2021. Parallel Proxy Re-Encryption Workload Distribution for Efficient Big Data Sharing in Cloud Computing. 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC). :0554–0559.

Cloud computing enables users and organizations to conveniently store and share data in large volumes and to enjoy on-demand services. Security and the protection of big data sharing from various attacks is the most challenging issue. Proxy re-encryption (PRE) is an effective method to improve the security of data sharing in the cloud environment. However, in PRE schemes, offloading big data for re-encryption will impose a heavy computational burden on the cloud proxy server, resulting in an increased computation delay and response time for the users. In this paper, we propose a novel parallel PRE workload distribution scheme to dynamically route the big data re-encryption process into the fog of the network. Moreover, this paper proposes a dynamic load balancing technique to avoid an excessive workload for the fog nodes. It also uses lightweight asymmetric cryptography to provide end-to-end security for the big data sharing between users. Within the proposed scheme, the offloading overhead on the centralized cloud server is effectively mitigated. Meanwhile, the processing delay incurred by the big data re-encryption process is efficiently improved.

2021-12-21

Maliszewski, Michal, Boryczka, Urszula. 2021. Using MajorClust Algorithm for Sandbox-Based ATM Security. 2021 IEEE Congress on Evolutionary Computation (CEC). :1054–1061.

Automated teller machines are affected by two kinds of attacks: physical and logical. It is common for most banks to look for zero-day protection for their devices. The most secure solutions available are based on complex security policies that are extremely hard to configure. The goal of this article is to present a concept of using the modified MajorClust algorithm for generating a sandbox-based security policy based on ATM usage data. The results obtained from the research prove the effectiveness of the used techniques and confirm that it is possible to create a division into sandboxes in an automated way.

2021-12-20

D'Agostino, Jack, Kul, Gokhan. 2021. Toward Pinpointing Data Leakage from Advanced Persistent Threats. 2021 7th IEEE Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS). :157–162.

Advanced Persistent Threats (APT) consist of most skillful hackers who employ sophisticated techniques to stealthily gain unauthorized access to private networks and exfiltrate sensitive data. When their existence is discovered, organizations - if they can sustain business continuity - mostly have to perform forensics activities to assess the damage of the attack and discover the extent of sensitive data leakage. In this paper, we construct a novel framework to pinpoint sensitive data that may have been leaked in such an attack. Our framework consists of creating baseline fingerprints for each workstation for setting normal activity, and we consider the change in the behavior of the network overall. We compare the accused fingerprint with sensitive database information by utilizing both Levenstein distance and TF-IDF/cosine similarity resulting in a similarity percentage. This allows us to pinpoint what part of data was exfiltrated by the perpetrators, where in the network the data originated, and if that data is sensitive to the private company's network. We then perform feasibility experiments to show that even these simple methods are feasible to run on a network representative of a mid-size business.