Biblio
The accessibility of the internet and mobile platforms has risen dramatically due to digital technology innovations. Web applications have opened up a variety of market possibilities by supplying consumers with a wide variety of digital technologies that benefit from high accessibility and functionality. Around the same time, web application protection continues to be an important challenge on the internet, and security must be taken seriously in order to secure confidential data. The threat is caused by inadequate validation of user input information, software developed without strict adherence to safety standards, vulnerability of reusable software libraries, software weakness, and so on. Through abusing a website's vulnerability, introduers are manipulating the user's information in order to exploit it for their own benefit. Then introduers inject their own malicious code, stealing passwords, manipulating user activities, and infringing on customers' privacy. As a result, information is leaked, applications malfunction, confidential data is accessed, etc. To mitigate the aforementioned issues, stacking ensemble based classifier model for Cross-site scripting (XSS) attack detection is proposed. Furthermore, the stacking ensembles technique is used in combination with different machine learning classification algorithms like k-Means, Random Forest and Decision Tree as base-learners to reliably detect XSS attack. Logistic Regression is used as meta-learner to predict the attack with greater accuracy. The classification algorithms in stacking model explore the problem in their own way and its results are given as input to the meta-learner to make final prediction, thus improving the overall detection accuracy of XSS attack in stacking than the individual models. The simulation findings demonstrate that the proposed model detects XSS attack successfully.
Network covert channels are used in various cyberattacks, including disclosure of sensitive information and enabling stealth tunnels for botnet commands. With time and technology, covert channels are becoming more prevalent, complex, and difficult to detect. The current methods for detection are protocol and pattern specific. This requires the investment of significant time and resources into application of various techniques to catch the different types of covert channels. This paper reviews several patterns of network storage covert channels, describes generation of network traffic dataset with covert channels, and proposes a generic, protocol-independent approach for the detection of network storage covert channels using a supervised machine learning technique. The implementation of the proposed generic detection model can lead to a reduction of necessary techniques to prevent covert channel communication in network traffic. The datasets we have generated for experimentation represent storage covert channels in the IP, TCP, and DNS protocols and are available upon request for future research in this area.
Load balancing and IP anycast are traffic routing algorithms used to speed up delivery of the Domain Name System. In case of a DDoS attack or an overload condition, the value of these protocols is critical, as they can provide intrinsic DDoS mitigation with the failover alternatives. In this paper, we present a methodology for predicting the next DNS response in the light of a potential redirection to less busy servers, in order to mitigate the size of the attack. Our experiments were conducted using data from the Nov. 2015 attack of the Root DNS servers and Logistic Regression, k-Nearest Neighbors, Support Vector Machines and Random Forest as our primary classifiers. The models were able to successfully predict up to 83% of responses for Root Letters that operated on a small number of sites and consequently suffered the most during the attacks. On the other hand, regarding DNS requests coming from more distributed Root servers, the models demonstrated lower accuracy. Our analysis showed a correlation between the True Positive Rate metric and the number of sites, as well as a clear need for intelligent management of traffic in load balancing practices.
The enormous growth of Internet-based traffic exposes corporate networks with a wide variety of vulnerabilities. Intrusive traffics are affecting the normal functionality of network's operation by consuming corporate resources and time. Efficient ways of identifying, protecting, and mitigating from intrusive incidents enhance productivity. As Intrusion Detection System (IDS) is hosted in the network and at the user machine level to oversee the malicious traffic in the network and at the individual computer, it is one of the critical components of a network and host security. Unsupervised anomaly traffic detection techniques are improving over time. This research aims to find an efficient classifier that detects anomaly traffic from NSL-KDD dataset with high accuracy level and minimal error rate by experimenting with five machine learning techniques. Five binary classifiers: Stochastic Gradient Decent, Random Forests, Logistic Regression, Support Vector Machine, and Sequential Model are tested and validated to produce the result. The outcome demonstrates that Random Forest Classifier outperforms the other four classifiers with and without applying the normalization process to the dataset.
Since the term “Fog Computing” has been coined by Cisco Systems in 2012, security and privacy issues of this promising paradigm are still open challenges. Among various security challenges, Access Control is a crucial concern for all cloud computing-like systems (e.g. Fog computing, Mobile edge computing) in the IoT era. Therefore, assigning the precise level of access in such an inherently scalable, heterogeneous and dynamic environment is not easy to perform. This work defines the uncertainty challenge for authentication phase of the access control in fog computing because on one hand fog has a number of characteristics that amplify uncertainty in authentication and on the other hand applying traditional access control models does not result in a flexible and resilient solution. Therefore, we have proposed a novel prediction model based on the extension of Attribute Based Access Control (ABAC) model. Our data-driven model is able to handle uncertainty in authentication. It is also able to consider the mobility of mobile edge devices in order to handle authentication. In doing so, we have built our model using and comparing four supervised classification algorithms namely as Decision Tree, Naïve Bayes, Logistic Regression and Support Vector Machine. Our model can achieve authentication performance with 88.14% accuracy using Logistic Regression.
The main security problems, typical for the Internet of Things (IoT), as well as the purpose of gaining unauthorized access to the IoT, are considered in this paper. Common characteristics of the most widespread botnets are provided. A method to detect compromised IoT devices included into a botnet is proposed. The method is based on a model of logistic regression. The article describes a developed model of logistic regression which allows to estimate the probability that a device initiating a connection is running a bot. A list of network protocols, used to gain unauthorized access to a device and to receive instructions from common and control (C&C) server, is provided too.
An important source of cyber-attacks is malware, which proliferates in different forms such as botnets. The botnet malware typically looks for vulnerable devices across the Internet, rather than targeting specific individuals, companies or industries. It attempts to infect as many connected devices as possible, using their resources for automated tasks that may cause significant economic and social harm while being hidden to the user and device. Thus, it becomes very difficult to detect such activity. A considerable amount of research has been conducted to detect and prevent botnet infestation. In this paper, we attempt to create a foundation for an anomaly-based intrusion detection system using a statistical learning method to improve network security and reduce human involvement in botnet detection. We focus on identifying the best features to detect botnet activity within network traffic using a lightweight logistic regression model. The network traffic is processed by Bro, a popular network monitoring framework which provides aggregate statistics about the packets exchanged between a source and destination over a certain time interval. These statistics serve as features to a logistic regression model responsible for classifying malicious and benign traffic. Our model is easy to implement and simple to interpret. We characterized and modeled 8 different botnet families separately and as a mixed dataset. Finally, we measured the performance of our model on multiple parameters using F1 score, accuracy and Area Under Curve (AUC).
At a time when all it takes to open a Twitter account is a mobile phone, the act of authenticating information encountered on social media becomes very complex, especially when we lack measures to verify digital identities in the first place. Because the platform supports anonymity, fake news generated by dubious sources have been observed to travel much faster and farther than real news. Hence, we need valid measures to identify authors of misinformation to avert these consequences. Researchers propose different authorship attribution techniques to approach this kind of problem. However, because tweets are made up of only 280 characters, finding a suitable authorship attribution technique is a challenge. This research aims to classify authors of tweets by comparing machine learning methods like logistic regression and naive Bayes. The processes of this application are fetching of tweets, pre-processing, feature extraction, and developing a machine learning model for classification. This paper illustrates the text classification for authorship process using machine learning techniques. In total, there were 46,895 tweets used as both training and testing data, and unique features specific to Twitter were extracted. Several steps were done in the pre-processing phase, including removal of short texts, removal of stop-words and punctuations, tokenizing and stemming of texts as well. This approach transforms the pre-processed data into a set of feature vector in Python. Logistic regression and naive Bayes algorithms were applied to the set of feature vectors for the training and testing of the classifier. The logistic regression based classifier gave the highest accuracy of 91.1% compared to the naive Bayes classifier with 89.8%.
VANET network is a new technology on which future intelligent transport systems are based; its purpose is to develop the vehicular environment and make it more comfortable. In addition, it provides more safety for drivers and cars on the road. Therefore, we have to make this technology as secured as possible against many threats. As VANET is a subclass of MANET, it has inherited many security problems but with a different architecture and DOS attacks are one of them. In this paper, we have focused on DOS attacks that prevent users to receive the right information at the right moment. We have analyzed DOS attacks behavior and effects on the network using different mathematical models in order to find an efficient solution.
There are several security requirements identification methods proposed by researchers in up-front requirements engineering (RE). However, in open source software (OSS) projects, developers use lightweight representation and refine requirements frequently by writing comments. They also tend to discuss security aspect in comments by providing code snippets, attachments, and external resource links. Since most security requirements identification methods in up-front RE are based on textual information retrieval techniques, these methods are not suitable for OSS projects or just-in-time RE. In our study, we propose a new model based on logistic regression to identify security requirements in OSS projects. We used five metrics to build security requirements identification models and tested the performance of these metrics by applying those models to three OSS projects. Our results show that four out of five metrics achieved high performance in intra-project testing.
In this paper we present results of a research on automatic extremist text detection. For this purpose an experimental dataset in the Russian language was created. According to the Russian legislation we cannot make it publicly available. We compared various classification methods (multinomial naive Bayes, logistic regression, linear SVM, random forest, and gradient boosting) and evaluated the contribution of differentiating features (lexical, semantic and psycholinguistic) to classification quality. The results of experiments show that psycholinguistic and semantic features are promising for extremist text detection.
This paper introduces an ensemble model that solves the binary classification problem by incorporating the basic Logistic Regression with the two recent advanced paradigms: extreme gradient boosted decision trees (xgboost) and deep learning. To obtain the best result when integrating sub-models, we introduce a solution to split and select sets of features for the sub-model training. In addition to the ensemble model, we propose a flexible robust and highly scalable new scheme for building a composite classifier that tries to simultaneously implement multiple layers of model decomposition and outputs aggregation to maximally reduce both bias and variance (spread) components of classification errors. We demonstrate the power of our ensemble model to solve the problem of predicting the outcome of Hearthstone, a turn-based computer game, based on game state information. Excellent predictive performance of our model has been acknowledged by the second place scored in the final ranking among 188 competing teams.
Logistic regression is a powerful machine learning tool to classify data. When dealing with sensitive data such as private or medical information, cares are necessary. In this paper, we propose a secure system for protecting the training data in logistic regression via homomorphic encryption. Perhaps surprisingly, despite the non-polynomial tasks of training in logistic regression, we show that only additively homomorphic encryption is needed to build our system. Our system is secure and scalable with the dataset size.
Novice programmers exhibit a repertoire of affective states over time when they are learning computer programming. The modeling of frustration is important as it informs on the need for pedagogical intervention of the student who may otherwise lose confidence and interest in the learning. In this paper, contextual and keystroke features of the students within a Java tutoring system are used to detect frustration of student within a programming exercise session. As compared to psychological sensors used in other studies, the use of contextual and keystroke logs are less obtrusive and the equipment used (keyboard) is ubiquitous in most learning environment. The technique of logistic regression with lasso regularization is utilized for the modeling to prevent over-fitting. The results showed that a model that uses only contextual and keystroke features achieved a prediction accuracy level of 0.67 and a recall measure of 0.833. Thus, we conclude that it is possible to detect frustration of a student from distilling both the contextual and keystroke logs within the tutoring system with an adequate level of accuracy.