Visible to the public Biblio

Filters: Keyword is apache spark  [Clear All Filters]
2023-03-31
Mudgal, Akshay, Bhatia, Shaveta.  2022.  A Step Towards Improvement in Classical Honeypot Security System. 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON). 1:720–725.
Data security is a vast term that doesn’t have any limits, but there are a certain amount of tools and techniques that could help in gaining security. Honeypot is among one of the tools that are designated and designed to protect the security of a network but in a very dissimilar manner. It is a system that is designed and developed to be compromised and exploited. Honeypots are meant to lure the invaders, but due to advancements in computing systems parallelly, the intruding technologies are also attaining their gigantic influence. In this research work, an approach involving apache-spark (a Big Data Technique) would be introduced in order to use it with the Honeypot System. This work includes an extensive study based on several research papers, through which elaborated experiment-based result has been expressed on the best known open-source honeypot systems. The preeminent possible method of using The Honeypot with apache spark in the sequential channel would also be proposed with the help of a framework diagram.
2022-04-13
Kousar, Heena, Mulla, Mohammed Moin, Shettar, Pooja, D. G., Narayan.  2021.  DDoS Attack Detection System using Apache Spark. 2021 International Conference on Computer Communication and Informatics (ICCCI). :1—5.
Distributed Denial of Service Attacks (DDoS) are most widely used cyber-attacks. Thus, design of DDoS detection mechanisms has attracted attention of researchers. Design of these mechanisms involves building statistical and machine learning models. Most of the work in design of mechanisms is focussed on improving the accuracy of the model. However, due to large volume of network traffic, scalability and performance of these techniques is an important research issue. In this work, we use Apache Spark framework for detection of DDoS attacks. We use NSL-KDD Cup as a benchmark dataset for experimental analysis. The results reveal that random forest performs better than decision trees and distributed processing improves the performance in terms of pre-processing and training time.
2020-08-28
Hasanin, Tawfiq, Khoshgoftaar, Taghi M., Leevy, Joffrey L..  2019.  A Comparison of Performance Metrics with Severely Imbalanced Network Security Big Data. 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI). :83—88.

Severe class imbalance between the majority and minority classes in large datasets can prejudice Machine Learning classifiers toward the majority class. Our work uniquely consolidates two case studies, each utilizing three learners implemented within an Apache Spark framework, six sampling methods, and five sampling distribution ratios to analyze the effect of severe class imbalance on big data analytics. We use three performance metrics to evaluate this study: Area Under the Receiver Operating Characteristic Curve, Area Under the Precision-Recall Curve, and Geometric Mean. In the first case study, models were trained on one dataset (POST) and tested on another (SlowlorisBig). In the second case study, the training and testing dataset roles were switched. Our comparison of performance metrics shows that Area Under the Precision-Recall Curve and Geometric Mean are sensitive to changes in the sampling distribution ratio, whereas Area Under the Receiver Operating Characteristic Curve is relatively unaffected. In addition, we demonstrate that when comparing sampling methods, borderline-SMOTE2 outperforms the other methods in the first case study, and Random Undersampling is the top performer in the second case study.

2019-05-20
Velthuis, Paul J. E., Schäfer, Marcel, Steinebach, Martin.  2018.  New Authentication Concept Using Certificates for Big Data Analytic Tools. Proceedings of the 13th International Conference on Availability, Reliability and Security. :40:1–40:7.

Companies analyse large amounts of data on clusters of machines, using big data analytic tools such as Apache Spark and Apache Flink to analyse the data. Big data analytic tools are mainly tested regarding speed and reliability. Efforts about Security and thus authentication are spent only at second glance. In such big data analytic tools, authentication is achieved with the help of the Kerberos protocol that is basically built as authentication on top of big data analytic tools. However, Kerberos is vulnerable to attacks, and it lacks providing high availability when users are all over the world. To improve the authentication, this work presents first an analysis of the authentication in Hadoop and the data analytic tools. Second, we propose a concept to deploy Transport Layer Security (TLS) not only for the security of data transportation but as well for authentication within the big data tools. This is done by establishing the connections using certificates with a short lifetime. The proof of concept is realized in Apache Spark, where Kerberos is replaced by the method proposed. We deploy new short living certificates for authentication that are less vulnerable to abuse. With our approach the requirements of the industry regarding multi-factor authentication and scalability are met.

2019-02-08
Sisiaridis, D., Markowitch, O..  2018.  Reducing Data Complexity in Feature Extraction and Feature Selection for Big Data Security Analytics. 2018 1st International Conference on Data Intelligence and Security (ICDIS). :43-48.

Feature extraction and feature selection are the first tasks in pre-processing of input logs in order to detect cybersecurity threats and attacks by utilizing data mining techniques in the field of Artificial Intelligence. When it comes to the analysis of heterogeneous data derived from different sources, these tasks are found to be time-consuming and difficult to be managed efficiently. In this paper, we present an approach for handling feature extraction and feature selection utilizing machine learning algorithms for security analytics of heterogeneous data derived from different network sensors. The approach is implemented in Apache Spark, using its python API, named pyspark.

2018-05-24
Huyn, Joojay.  2017.  A Scalable Real-Time Framework for DDoS Traffic Monitoring and Characterization. Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, Applications and Technologies. :265–266.

Volumetric DDoS attacks continue to inflict serious damage. Many proposed defenses for mitigating such attacks assume that a monitoring system has already detected the attack. However, many proposed DDoS monitoring systems do not focus on efficiently analyzing high volume network traffic to provide important characterizations of the attack in real-time to downstream traffic filtering systems. We propose a scalable real-time framework for an effective volumetric DDoS monitoring system that leverages modern big data technologies for streaming analytics of high volume network traffic to accurately detect and characterize attacks.

2018-02-06
Vimalkumar, K., Radhika, N..  2017.  A Big Data Framework for Intrusion Detection in Smart Grids Using Apache Spark. 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI). :198–204.

Technological advancement enables the need of internet everywhere. The power industry is not an exception in the technological advancement which makes everything smarter. Smart grid is the advanced version of the traditional grid, which makes the system more efficient and self-healing. Synchrophasor is a device used in smart grids to measure the values of electric waves, voltages and current. The phasor measurement unit produces immense volume of current and voltage data that is used to monitor and control the performance of the grid. These data are huge in size and vulnerable to attacks. Intrusion Detection is a common technique for finding the intrusions in the system. In this paper, a big data framework is designed using various machine learning techniques, and intrusions are detected based on the classifications applied on the synchrophasor dataset. In this approach various machine learning techniques like deep neural networks, support vector machines, random forest, decision trees and naive bayes classifications are done for the synchrophasor dataset and the results are compared using metrics of accuracy, recall, false rate, specificity, and prediction time. Feature selection and dimensionality reduction algorithms are used to reduce the prediction time taken by the proposed approach. This paper uses apache spark as a platform which is suitable for the implementation of Intrusion Detection system in smart grids using big data analytics.