Zuech, Richard, Hancock, John, Khoshgoftaar, Taghi M..
2021.
Feature Popularity Between Different Web Attacks with Supervised Feature Selection Rankers. 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA). :30–37.
We introduce the novel concept of feature popularity with three different web attacks and big data from the CSE-CIC-IDS2018 dataset: Brute Force, SQL Injection, and XSS web attacks. Feature popularity is based upon ensemble Feature Selection Techniques (FSTs) and allows us to more easily understand common important features between different cyberattacks, for two main reasons. First, feature popularity lists can be generated to provide an easy comprehension of important features across different attacks. Second, the Jaccard similarity metric can provide a quantitative score for how similar feature subsets are between different attacks. Both of these approaches not only provide more explainable and easier-to-understand models, but they can also reduce the complexity of implementing models in real-world systems. Four supervised learning-based FSTs are used to generate feature subsets for each of our three different web attack datasets, and then our feature popularity frameworks are applied. For these three web attacks, the XSS and SQL Injection feature subsets are the most similar per the Jaccard similarity. The most popular features across all three web attacks are: Flow\_Bytes\_s, FlowİAT\_Max, and Flow\_Packets\_s. While this introductory study is only a simple example using only three web attacks, this feature popularity concept can be easily extended, allowing an automated framework to more easily determine the most popular features across a very large number of attacks and features.
Kawanishi, Yasuyuki, Nishihara, Hideaki, Yoshida, Hirotaka, Hata, Yoichi.
2021.
A Study of The Risk Quantification Method focusing on Direct-Access Attacks in Cyber-Physical Systems. 2021 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). :298–305.
Direct-access attacks were initially considered as un-realistic threats in cyber security because the attacker can more easily mount other non-computerized attacks like cutting a brake line. In recent years, some research into direct-access attacks have been conducted especially in the automotive field, for example, research on an attack method that makes the ECU stop functioning via the CAN bus. The problem with existing risk quantification methods is that direct-access attacks seem not to be recognized as serious threats. To solve this problem, we propose a new risk quantification method by applying vulnerability evaluation criteria and by setting metrics. We also confirm that direct-access attacks not recognized by conventional methods can be evaluated appropriately, using the case study of an automotive system as an example of a cyber-physical system.
Singh, A K, Goyal, Navneet.
2021.
Detection of Malicious Webpages Using Deep Learning. 2021 IEEE International Conference on Big Data (Big Data). :3370–3379.
Malicious Webpages have been a serious threat on Internet for the past few years. As per the latest Google Transparency reports, they continue to be top ranked amongst online threats. Various techniques have been used till date to identify malicious sites, to include, Static Heuristics, Honey Clients, Machine Learning, etc. Recently, with the rapid rise of Deep Learning, an interest has aroused to explore Deep Learning techniques for detecting Malicious Webpages. In this paper Deep Learning has been utilized for such classification. The model proposed in this research has used a Deep Neural Network (DNN) with two hidden layers to distinguish between Malicious and Benign Webpages. This DNN model gave high accuracy of 99.81% with very low False Positives (FP) and False Negatives (FN), and with near real-time response on test sample. The model outperformed earlier machine learning solutions in accuracy, precision, recall and time performance metrics.
Schneider, Madeleine, Aspinall, David, Bastian, Nathaniel D..
2021.
Evaluating Model Robustness to Adversarial Samples in Network Intrusion Detection. 2021 IEEE International Conference on Big Data (Big Data). :3343–3352.
Adversarial machine learning, a technique which seeks to deceive machine learning (ML) models, threatens the utility and reliability of ML systems. This is particularly relevant in critical ML implementations such as those found in Network Intrusion Detection Systems (NIDS). This paper considers the impact of adversarial influence on NIDS and proposes ways to improve ML based systems. Specifically, we consider five feature robustness metrics to determine which features in a model are most vulnerable, and four defense methods. These methods are tested on six ML models with four adversarial sample generation techniques. Our results show that across different models and adversarial generation techniques, there is limited consistency in vulnerable features or in effectiveness of defense method.
Kim, Seongsoo, Chen, Lei, Kim, Jongyeop.
2021.
Intrusion Prediction using Long Short-Term Memory Deep Learning with UNSW-NB15. 2021 IEEE/ACIS 6th International Conference on Big Data, Cloud Computing, and Data Science (BCD). :53–59.
This study shows the effectiveness of anomaly-based IDS using long short-term memory(LSTM) based on the newly developed dataset called UNSW-NB15 while considering root mean square error and mean absolute error as evaluation metrics for accuracy. For each attack, 80% and 90% of samples were used as LSTM inputs and trained this model while increasing epoch values. Furthermore, this model has predicted attack points by applying test data and produced possible attack points for each attack at the 3rd time frame against the actual attack point. However, in the case of an Exploit attack, the consecutive overlapping attacks happen, there was ambiguity in the interpretation of the numerical values calculated by the LSTM. We presented a methodology for training data with binary values using LSTM and evaluation with RMSE metrics throughout this study.
Hancock, John, Khoshgoftaar, Taghi M., Leevy, Joffrey L..
2021.
Detecting SSH and FTP Brute Force Attacks in Big Data. 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA). :760–765.
We present a simple approach for detecting brute force attacks in the CSE-CIC-IDS2018 Big Data dataset. We show our approach is preferable to more complex approaches since it is simpler, and yields stronger classification performance. Our contribution is to show that it is possible to train and test simple Decision Tree models with two independent variables to classify CSE-CIC-IDS2018 data with better results than reported in previous research, where more complex Deep Learning models are employed. Moreover, we show that Decision Tree models trained on data with two independent variables perform similarly to Decision Tree models trained on a larger number independent variables. Our experiments reveal that simple models, with AUC and AUPRC scores greater than 0.99, are capable of detecting brute force attacks in CSE-CIC-IDS2018. To the best of our knowledge, these are the strongest performance metrics published for the machine learning task of detecting these types of attacks. Furthermore, the simplicity of our approach, combined with its strong performance, makes it an appealing technique.
Yasa, Ray Novita, Buana, I Komang Setia, Girinoto, Setiawan, Hermawan, Hadiprakoso, Raden Budiarto.
2021.
Modified RNP Privacy Protection Data Mining Method as Big Data Security. 2021 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS. :30–34.
Privacy-Preserving Data Mining (PPDM) has become an exciting topic to discuss in recent decades due to the growing interest in big data and data mining. A technique of securing data but still preserving the privacy that is in it. This paper provides an alternative perturbation-based PPDM technique which is carried out by modifying the RNP algorithm. The novelty given in this paper are modifications of some steps method with a specific purpose. The modifications made are in the form of first narrowing the selection of the disturbance value. With the aim that the number of attributes that are replaced in each record line is only as many as the attributes in the original data, no more and no need to repeat; secondly, derive the perturbation function from the cumulative distribution function and use it to find the probability distribution function so that the selection of replacement data has a clear basis. The experiment results on twenty-five perturbed data show that the modified RNP algorithm balances data utility and security level by selecting the appropriate disturbance value and perturbation value. The level of security is measured using privacy metrics in the form of value difference, average transformation of data, and percentage of retains. The method presented in this paper is fascinating to be applied to actual data that requires privacy preservation.
Qureshi, Hifza, Sagar, Anil Kumar, Astya, Rani, Shrivastava, Gulshan.
2021.
Big Data Analytics for Smart Education. 2021 IEEE 6th International Conference on Computing, Communication and Automation (ICCCA). :650–658.
The existing education system, which incorporates school assessments, has some flaws. Conventional teaching methods give students no immediate feedback, also make teachers to spend hours grading repetitive assignments, and aren't very constructive in showing students how to improve in their academics, and also fail to take advantage of digital opportunities that can improve learning outcomes. In addition, since a single teacher has to manage a class of students, it gets difficult to focus on each and every student in the class. Furthermore, with the help of a management system for better learning, educational organizations can now implement administrative analytics and execute new business intelligence using big data. This data visualization aids in the evaluation of teaching, management, and study success metrics. In this paper, there is put forward a discussion on how Data Mining and Data Analytics can help make the experience of learning and teaching both, easier and accountable. There will also be discussion on how the education organization has undergone numerous challenges in terms of effective and efficient teachings, student-performance. In addition development, and inadequate data storage, processing, and analysis will also be discussed. The research implements Python programming language on big education data. In addition, the research adopted an exploratory research design to identify the complexities and requirements of big data in the education field.