Biblio
In recent trends, privacy preservation is the most predominant factor, on big data analytics and cloud computing. Every organization collects personal data from the users actively or passively. Publishing this data for research and other analytics without removing Personally Identifiable Information (PII) will lead to the privacy breach. Existing anonymization techniques are failing to maintain the balance between data privacy and data utility. In order to provide a trade-off between the privacy of the users and data utility, a Mondrian based k-anonymity approach is proposed. To protect the privacy of high-dimensional data Deep Neural Network (DNN) based framework is proposed. The experimental result shows that the proposed approach mitigates the information loss of the data without compromising privacy.
Accurate network traffic identification is an important basis for network traffic monitoring and data analysis, and is the key to improve the quality of user service. In this paper, through the analysis of two network traffic identification methods based on machine learning and deep packet inspection, a network traffic identification method based on machine learning and deep packet inspection is proposed. This method uses deep packet inspection technology to identify most network traffic, reduces the workload that needs to be identified by machine learning method, and deep packet inspection can identify specific application traffic, and improves the accuracy of identification. Machine learning method is used to assist in identifying network traffic with encryption and unknown features, which makes up for the disadvantage of deep packet inspection that can not identify new applications and encrypted traffic. Experiments show that this method can improve the identification rate of network traffic.
Keeping Internet users safe from attacks and other threats is one of the biggest security challenges nowadays. Distributed Denial of Service (DDoS) [1] is one of the most common attacks. DDoS makes the system stop working by resource overload. Software Define Networking (SDN) [2] has recently emerged as a new networking technology offering an unprecedented programmability that allows network operators to dynamically configure and manage their infrastructures. The flexible processing and centralized management of SDN controller allow flexibly deploying complex security algorithms and mitigation methods. In this paper, we propose a new TCP-SYN flood attack mitigation in SDN networks using machine learning. By using a testbed, we implement the proposed algorithms, evaluate their accuracy and address the trade-off between the accuracy and capacity of the security device. The results show that the algorithms can mitigate TCP-SYN Flood attack over 96.
Increasingly organizations are collecting ever larger amounts of data to build complex data analytics, machine learning and AI models. Furthermore, the data needed for building such models may be unstructured (e.g., text, image, and video). Hence such data may be stored in different data management systems ranging from relational databases to newer NoSQL databases tailored for storing unstructured data. Furthermore, data scientists are increasingly using programming languages such as Python, R etc. to process data using many existing libraries. In some cases, the developed code will be automatically executed by the NoSQL system on the stored data. These developments indicate the need for a data security and privacy solution that can uniformly protect data stored in many different data management systems and enforce security policies even if sensitive data is processed using a data scientist submitted complex program. In this paper, we introduce our vision for building such a solution for protecting big data. Specifically, our proposed system system allows organizations to 1) enforce policies that control access to sensitive data, 2) keep necessary audit logs automatically for data governance and regulatory compliance, 3) sanitize and redact sensitive data on-the-fly based on the data sensitivity and AI model needs, 4) detect potentially unauthorized or anomalous access to sensitive data, 5) automatically create attribute-based access control policies based on data sensitivity and data type.
In recent days, Enterprises are expanding their business efficiently through web applications which has paved the way for building good consumer relationship with its customers. The major threat faced by these enterprises is their inability to provide secure environments as the web applications are prone to severe vulnerabilities. As a result of this, many security standards and tools have been evolving to handle the vulnerabilities. Though there are many vulnerability detection tools available in the present, they do not provide sufficient information on the attack. For the long-term functioning of an organization, data along with efficient analytics on the vulnerabilities is required to enhance its reliability. The proposed model thus aims to make use of Machine Learning with Analytics to solve the problem in hand. Hence, the sequence of the attack is detected through the pattern using PAA and further the detected vulnerabilities are classified using Machine Learning technique such as SVM. Probabilistic results are provided in order to obtain numerical data sets which could be used for obtaining a report on user and application behavior. Dynamic and Reconfigurable PAA with SVM Classifier is a challenging task to analyze the vulnerabilities and impact of these vulnerabilities in heterogeneous web environment. This will enhance the former processing by analysis of the origin and the pattern of the attack in a more effective manner. Hence, the proposed system is designed to perform detection of attacks. The system works on the mitigation and prevention as part of the attack prediction.
Deep neural networks (DNNs) provide good performance for image recognition, speech recognition, and pattern recognition. However, a poisoning attack is a serious threat to DNN's security. The poisoning attack is a method to reduce the accuracy of DNN by adding malicious training data during DNN training process. In some situations such as a military, it may be necessary to drop only a chosen class of accuracy in the model. For example, if an attacker does not allow only nuclear facilities to be selectively recognized, it may be necessary to intentionally prevent UAV from correctly recognizing nuclear-related facilities. In this paper, we propose a selective poisoning attack that reduces the accuracy of only chosen class in the model. The proposed method reduces the accuracy of a chosen class in the model by training malicious training data corresponding to a chosen class, while maintaining the accuracy of the remaining classes. For experiment, we used tensorflow as a machine learning library and MNIST and CIFAR10 as datasets. Experimental results show that the proposed method can reduce the accuracy of the chosen class to 43.2% and 55.3% in MNIST and CIFAR10, while maintaining the accuracy of the remaining classes.
Cyber-Physical Systems (CPS) are growing with added complexity and functionality. Multidisciplinary interactions with physical systems are the major keys to CPS. However, sensors, actuators, controllers, and wireless communications are prone to attacks that compromise the system. Machine learning models have been utilized in controllers of automotive to learn, estimate, and provide the required intelligence in the control process. However, their estimation is also vulnerable to the attacks from physical or cyber domains. They have shown unreliable predictions against unknown biases resulted from the modeling. In this paper, we propose a novel control design using conditional generative adversarial networks that will enable a self-secured controller to capture the normal behavior of the control loop and the physical system, detect the anomaly, and recover from them. We experimented our novel control design on a self-secured BMS by driving a Nissan Leaf S on standard driving cycles while under various attacks. The performance of the design has been compared to the state-of-the-art; the self-secured BMS could detect the attacks with 83% accuracy and the recovery estimation error of 21% on average, which have improved by 28% and 8%, respectively.
Malware detection is an indispensable factor in security of internet oriented machines. The combinations of different features are used for dynamic malware analysis. The different combinations are generated from APIs, Summary Information, DLLs and Registry Keys Changed. Cuckoo sandbox is used for dynamic malware analysis, which is customizable, and provide good accuracy. More than 2300 features are extracted from dynamic analysis of malware and 92 features are extracted statically from binary malware using PEFILE. Static features are extracted from 39000 malicious binaries and 10000 benign files. Dynamically 800 benign files and 2200 malware files are analyzed in Cuckoo Sandbox and 2300 features are extracted. The accuracy of dynamic malware analysis is 94.64% while static analysis accuracy is 99.36%. The dynamic malware analysis is not effective due to tricky and intelligent behaviours of malwares. The dynamic analysis has some limitations due to controlled network behavior and it cannot be analyzed completely due to limited access of network.
Current testing for Deep Neural Networks (DNNs) focuses on quantity of test cases but ignores diversity. To the best of our knowledge, DeepXplore is the first white-box framework for Deep Learning testing by triggering differential behaviors between multiple DNNs and increasing neuron coverage to improve diversity. Since it is based on multiple DNNs facing problems that (1) the framework is not friendly to a single DNN, (2) if incorrect predictions made by all DNNs simultaneously, DeepXplore cannot generate test cases. This paper presents Test4Deep, a white-box testing framework based on a single DNN. Test4Deep avoids mistakes of multiple DNNs by inducing inconsistencies between predicted labels of original inputs and that of generated test inputs. Meanwhile, Test4Deep improves neuron coverage to capture more diversity by attempting to activate more inactivated neurons. The proposed method was evaluated on three popular datasets with nine DNNs. Compared to DeepXplore, Test4Deep produced average 4.59% (maximum 10.49%) more test cases that all found errors and faults of DNNs. These test cases got 19.57% more diversity increment and 25.88% increment of neuron coverage. Test4Deep can further be used to improve the accuracy of DNNs by average up to 5.72% (maximum 7.0%).
Science gateways bring out the possibility of reproducible science as they are integrated into reusable techniques, data and workflow management systems, security mechanisms, and high performance computing (HPC). We introduce BioinfoPortal, a science gateway that integrates a suite of different bioinformatics applications using HPC and data management resources provided by the Brazilian National HPC System (SINAPAD). BioinfoPortal follows the Software as a Service (SaaS) model and the web server is freely available for academic use. The goal of this paper is to describe the science gateway and its usage, addressing challenges of designing a multiuser computational platform for parallel/distributed executions of large-scale bioinformatics applications using the Brazilian HPC resources. We also present a study of performance and scalability of some bioinformatics applications executed in the HPC environments and perform machine learning analyses for predicting features for the HPC allocation/usage that could better perform the bioinformatics applications via BioinfoPortal.
With the rapidly increasing connectivity in cyberspace, Insider Threat is becoming a huge concern. Insider threat detection from system logs poses a tremendous challenge for human analysts. Analyzing log files of an organization is a key component of an insider threat detection and mitigation program. Emerging machine learning approaches show tremendous potential for performing complex and challenging data analysis tasks that would benefit the next generation of insider threat detection systems. However, with huge sets of heterogeneous data to analyze, applying machine learning techniques effectively and efficiently to such a complex problem is not straightforward. In this paper, we extract a concise set of features from the system logs while trying to prevent loss of meaningful information and providing accurate and actionable intelligence. We investigate two unsupervised anomaly detection algorithms for insider threat detection and draw a comparison between different structures of the system logs including daily dataset and periodically aggregated one. We use the generated anomaly score from the previous cycle as the trust score of each user fed to the next period's model and show its importance and impact in detecting insiders. Furthermore, we consider the psychometric score of users in our model and check its effectiveness in predicting insiders. As far as we know, our model is the first one to take the psychometric score of users into consideration for insider threat detection. Finally, we evaluate our proposed approach on CERT insider threat dataset (v4.2) and show how it outperforms previous approaches.
This study has built a simulation of a smart home system by the Alibaba ECS. The architecture of hardware was based on edge computing technology. The whole method would design a clear classifier to find the boundary between regular and mutation codes. It could be applied in the detection of the mutation code of network. The project has used the dataset vector to divide them into positive and negative type, and the final result has shown the RBF-function SVM method perform best in this mission. This research has got a good network security detection in the IoT systems and increased the applications of machine learning.
Software vulnerabilities often remain hidden until an attacker exploits the weak/insecure code. Therefore, testing the software from a vulnerability discovery perspective becomes challenging for developers if they do not inspect their code thoroughly (which is time-consuming). We propose that vulnerability prediction using certain software metrics can support the testing process by identifying vulnerable code-components (e.g., functions, classes, etc.). Once a code-component is predicted as vulnerable, the developers can focus their testing efforts on it, thereby avoiding the time/effort required for testing the entire application. The current paper presents a study that compares how software metrics perform as vulnerability predictors for software projects developed in two different languages (Java vs Python). The goal of this research is to analyze the vulnerability prediction performance of software metrics for different programming languages. We designed and conducted experiments on security vulnerabilities reported for three Java projects (Apache Tomcat 6, Tomcat 7, Apache CXF) and two Python projects (Django and Keystone). In this paper, we focus on a specific type of code component: Functions. We apply Machine Learning models for predicting vulnerable functions. Overall results show that software metrics-based vulnerability prediction is more useful for Java projects than Python projects (i.e., software metrics when used as features were able to predict Java vulnerable functions with a higher recall and precision compared to Python vulnerable functions prediction).
In this paper, we explore the use of machine learning technique for wormhole attack detection in ad hoc network. This work has categorized into three major tasks. One of our tasks is a simulation of wormhole attack in an ad hoc network environment with multiple wormhole tunnels. A next task is the characterization of packet attributes that lead to feature selection. Consequently, we perform data generation and data collection operation that provide large volume dataset. The final task is applied to machine learning technique for wormhole attack detection. Prior to this, a wormhole attack has detected using traditional approaches. In those, a Multirate-DelPHI is shown best results as detection rate is 90%, and the false alarm rate is 20%. We conduct experiments and illustrate that our method performs better resulting in all statistical parameters such as detection rate is 93.12% and false alarm rate is 5.3%. Furthermore, we have also shown results on various statistical parameters such as Precision, F-measure, MCC, and Accuracy.
Images perturbed subtly to be misclassified by neural networks, called adversarial examples, have emerged as a technically deep challenge and an important concern for several application domains. Most research on adversarial examples takes as its only constraint that the perturbed images are similar to the originals. However, real-world application of these ideas often requires the examples to satisfy additional objectives, which are typically enforced through custom modifications of the perturbation process. In this article, we propose adversarial generative nets (AGNs), a general methodology to train a generator neural network to emit adversarial examples satisfying desired objectives. We demonstrate the ability of AGNs to accommodate a wide range of objectives, including imprecise ones difficult to model, in two application domains. In particular, we demonstrate physical adversarial examples—eyeglass frames designed to fool face recognition—with better robustness, inconspicuousness, and scalability than previous approaches, as well as a new attack to fool a handwritten-digit classifier.
A machine learning model of the protein interaction network has been developed by researchers to explore how viruses operate. This research can be applied to different types of attacks and network models across different fields, including network security. The capacity to determine how trolls and bots influence users on social media platforms has also been explored through this research.
With the tremendous growth of IoT botnet DDoS attacks in recent years, IoT security has now become one of the most concerned topics in the field of network security. A lot of security approaches have been proposed in the area, but they still lack in terms of dealing with newer emerging variants of IoT malware, known as Zero-Day Attacks. In this paper, we present a honeypot-based approach which uses machine learning techniques for malware detection. The IoT honeypot generated data is used as a dataset for the effective and dynamic training of a machine learning model. The approach can be taken as a productive outset towards combatting Zero-Day DDoS Attacks which now has emerged as an open challenge in defending IoT against DDoS Attacks.
Recently, malicious insider attacks represent one of the most damaging threats to companies and government agencies. This paper proposes a new framework in constructing a user-centered machine learning based insider threat detection system on multiple data granularity levels. System evaluations and analysis are performed not only on individual data instances but also on normal and malicious insiders, where insider scenario specific results and delay in detection are reported and discussed. Our results show that the machine learning based detection system can learn from limited ground truth and detect new malicious insiders with a high accuracy.
The continuing decrease in feature size of integrated circuits, and the increase of the complexity and cost of design and fabrication has led to outsourcing the design and fabrication of integrated circuits to third parties across the globe, and in turn has introduced several security vulnerabilities. The adversaries in the supply chain can pirate integrated circuits, overproduce these circuits, perform reverse engineering, and/or insert hardware Trojans in these circuits. Developing countermeasures against such security threats is highly crucial. Accordingly, this paper first develops a learning-based trust verification framework to detect hardware Trojans. To tackle Trojan insertion, IP piracy and overproduction, logic locking schemes and in particular stripped functionality logic locking is discussed and its resiliency against the state-of-the-art attacks is investigated.
In parallel with the increasing growth of the Internet and computer networks, the number of malwares has been increasing every day. Today, one of the newest attacks and the biggest threats in cybersecurity is ransomware. The effectiveness of applying machine learning techniques for malware detection has been explored in much scientific research, however, there is few studies focused on machine learning-based ransomware detection. In this paper, the effectiveness of ransomware detection using machine learning methods applied to CICAndMal2017 dataset is examined in two experiments. First, the classifiers are trained on a single dataset containing different types of ransomware. Second, different classifiers are trained on datasets of 10 ransomware families distinctly. Our findings imply that in both experiments random forest outperforms other tested classifiers and the performance of the classifiers are not changed significantly when they are trained on each family distinctly. Therefore, the random forest classification method is very effective in ransomware detection.