Visible to the public Biblio

Filters: Keyword is logistic regression  [Clear All Filters]
2023-02-02
Aggarwal, Naman, Aggarwal, Pradyuman, Gupta, Rahul.  2022.  Static Malware Analysis using PE Header files API. 2022 6th International Conference on Computing Methodologies and Communication (ICCMC). :159–162.
In today’s fast pacing world, cybercrimes have time and again proved to be one of the biggest hindrances in national development. According to recent trends, most of the times the victim’s data is breached by trapping it in a phishing attack. Security and privacy of user’s data has become a matter of tremendous concern. In order to address this problem and to protect the naive user’s data, a tool which may help to identify whether a window executable is malicious or not by doing static analysis on it has been proposed. As well as a comparative study has been performed by implementing different classification models like Logistic Regression, Neural Network, SVM. The static analysis approach used takes into parameters of the executables, analysis of properties obtained from PE Section Headers i.e. API calls. Comparing different model will provide the best model to be used for static malware analysis
2022-10-12
Kumar, Yogendra, Subba, Basant.  2021.  A lightweight machine learning based security framework for detecting phishing attacks. 2021 International Conference on COMmunication Systems & NETworkS (COMSNETS). :184—188.
A successful phishing attack is prelude to various other severe attacks such as login credentials theft, unauthorized access to user’s confidential data, malware and ransomware infestation of victim’s machine etc. This paper proposes a real time lightweight machine learning based security framework for detection of phishing attacks through analysis of Uniform Resource Locators (URLs). The proposed framework initially extracts a set of highly discriminating and uncorrelated features from the URL string corpus. These extracted features are then used to transform the URL strings into their corresponding numeric feature vectors, which are eventually used to train various machine learning based classifier models for identification of malicious phishing URLs. Performance analysis of the proposed security framework on two well known datasets: Kaggle dataset and UNB dataset shows that it is capable of detecting malicious phishing URLs with high precision, while at the same time maintain a very low level of false positive rate. The proposed framework is also shown to outperform other similar security frameworks proposed in the literature.121https://www.kaggle.com/antonyj453/ur1dataset2https://www.unb.ca/cic/datasets/ur1-2016.htm1
2022-07-01
Hashim, Aya, Medani, Razan, Attia, Tahani Abdalla.  2021.  Defences Against web Application Attacks and Detecting Phishing Links Using Machine Learning. 2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE). :1–6.
In recent years web applications that are hacked every day estimated to be 30 000, and in most cases, web developers or website owners do not even have enough knowledge about what is happening on their sites. Web hackers can use many attacks to gain entry or compromise legitimate web applications, they can also deceive people by using phishing sites to collect their sensitive and private information. In response to this, the need is raised to take proper measures to understand the risks and be aware of the vulnerabilities that may affect the website and hence the normal business flow. In the scope of this study, mitigations against the most common web application attacks are set, and the web administrator is provided with ways to detect phishing links which is a social engineering attack, the study also demonstrates the generation of web application logs that simplifies the process of analyzing the actions of abnormal users to show when behavior is out of bounds, out of scope, or against the rules. The methods of mitigation are accomplished by secure coding techniques and the methods for phishing link detection are performed by various machine learning algorithms and deep learning techniques. The developed application has been tested and evaluated against various attack scenarios, the outcomes obtained from the test process showed that the website had successfully mitigated these dangerous web application attacks, and for the detection of phishing links part, a comparison is made between different algorithms to find the best one, and the outcome of the best model gave 98% accuracy.
2022-04-19
Perumal, Seethalakshmi, Sujatha P, Kola.  2021.  Stacking Ensemble-based XSS Attack Detection Strategy Using Classification Algorithms. 2021 6th International Conference on Communication and Electronics Systems (ICCES). :897–901.

The accessibility of the internet and mobile platforms has risen dramatically due to digital technology innovations. Web applications have opened up a variety of market possibilities by supplying consumers with a wide variety of digital technologies that benefit from high accessibility and functionality. Around the same time, web application protection continues to be an important challenge on the internet, and security must be taken seriously in order to secure confidential data. The threat is caused by inadequate validation of user input information, software developed without strict adherence to safety standards, vulnerability of reusable software libraries, software weakness, and so on. Through abusing a website's vulnerability, introduers are manipulating the user's information in order to exploit it for their own benefit. Then introduers inject their own malicious code, stealing passwords, manipulating user activities, and infringing on customers' privacy. As a result, information is leaked, applications malfunction, confidential data is accessed, etc. To mitigate the aforementioned issues, stacking ensemble based classifier model for Cross-site scripting (XSS) attack detection is proposed. Furthermore, the stacking ensembles technique is used in combination with different machine learning classification algorithms like k-Means, Random Forest and Decision Tree as base-learners to reliably detect XSS attack. Logistic Regression is used as meta-learner to predict the attack with greater accuracy. The classification algorithms in stacking model explore the problem in their own way and its results are given as input to the meta-learner to make final prediction, thus improving the overall detection accuracy of XSS attack in stacking than the individual models. The simulation findings demonstrate that the proposed model detects XSS attack successfully.

2021-02-23
Ashraf, S., Ahmed, T..  2020.  Sagacious Intrusion Detection Strategy in Sensor Network. 2020 International Conference on UK-China Emerging Technologies (UCET). :1—4.
Almost all smart appliances are operated through wireless sensor networks. With the passage of time, due to various applications, the WSN becomes prone to various external attacks. Preventing such attacks, Intrusion Detection strategy (IDS) is very crucial to secure the network from the malicious attackers. The proposed IDS methodology discovers the pattern in large data corpus which works for different types of algorithms to detect four types of Denial of service (DoS) attacks, namely, Grayhole, Blackhole, Flooding, and TDMA. The state-of-the-art detection algorithms, such as KNN, Naïve Bayes, Logistic Regression, Support Vector Machine (SVM), and ANN are applied to the data corpus and analyze the performance in detecting the attacks. The analysis shows that these algorithms are applicable for the detection and prediction of unavoidable attacks and can be recommended for network experts and analysts.
2020-07-16
Ayub, Md. Ahsan, Smith, Steven, Siraj, Ambareen.  2019.  A Protocol Independent Approach in Network Covert Channel Detection. 2019 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC). :165—170.

Network covert channels are used in various cyberattacks, including disclosure of sensitive information and enabling stealth tunnels for botnet commands. With time and technology, covert channels are becoming more prevalent, complex, and difficult to detect. The current methods for detection are protocol and pattern specific. This requires the investment of significant time and resources into application of various techniques to catch the different types of covert channels. This paper reviews several patterns of network storage covert channels, describes generation of network traffic dataset with covert channels, and proposes a generic, protocol-independent approach for the detection of network storage covert channels using a supervised machine learning technique. The implementation of the proposed generic detection model can lead to a reduction of necessary techniques to prevent covert channel communication in network traffic. The datasets we have generated for experimentation represent storage covert channels in the IP, TCP, and DNS protocols and are available upon request for future research in this area.

2020-04-03
Saridou, Betty, Shiaeles, Stavros, Papadopoulos, Basil.  2019.  DDoS Attack Mitigation through Root-DNS Server: A Case Study. 2019 IEEE World Congress on Services (SERVICES). 2642-939X:60—65.

Load balancing and IP anycast are traffic routing algorithms used to speed up delivery of the Domain Name System. In case of a DDoS attack or an overload condition, the value of these protocols is critical, as they can provide intrinsic DDoS mitigation with the failover alternatives. In this paper, we present a methodology for predicting the next DNS response in the light of a potential redirection to less busy servers, in order to mitigate the size of the attack. Our experiments were conducted using data from the Nov. 2015 attack of the Root DNS servers and Logistic Regression, k-Nearest Neighbors, Support Vector Machines and Random Forest as our primary classifiers. The models were able to successfully predict up to 83% of responses for Root Letters that operated on a small number of sites and consequently suffered the most during the attacks. On the other hand, regarding DNS requests coming from more distributed Root servers, the models demonstrated lower accuracy. Our analysis showed a correlation between the True Positive Rate metric and the number of sites, as well as a clear need for intelligent management of traffic in load balancing practices.

2020-01-27
Rocamora, Josyl Mariela, Ho, Ivan Wang-Hei, Mak, Man-Wai.  2019.  Fingerprint Quality Classification for CSI-based Indoor Positioning Systems. Proceedings of the ACM MobiHoc Workshop on Pervasive Systems in the IoT Era. :31–36.
Recent indoor positioning systems that utilize channel state information (CSI) consider ideal scenarios to achieve high-accuracy performance in fingerprint matching. However, one essential component in achieving high accuracy is the collection of high-quality fingerprints. The quality of fingerprints may vary due to uncontrollable factors such as environment noise, interference, and hardware instability. In our paper, we propose a method for collecting high-quality fingerprints for indoor positioning. First, we have developed a logistic regression classifier based on gradient descent to evaluate the quality of the collected channel frequency response (CFR) samples. We employ the classifier to sift out poor CFR samples and only retain good ones as input to the positioning system. We discover that our classifier can achieve high classification accuracy from over thousands of CFR samples. We then evaluate the positioning accuracy based on two techniques: Time-Reversal Resonating Strength (TRRS) and Support Vector Machines (SVM). We find that the sifted fingerprints always result in better positioning performance. For example, an average percentage improvement of 114% for TRRS and 22% for SVM compared to that of unsifted fingerprints of the same 40-MHz effective bandwidth.
2020-01-20
Yihunie, Fekadu, Abdelfattah, Eman, Regmi, Amish.  2019.  Applying Machine Learning to Anomaly-Based Intrusion Detection Systems. 2019 IEEE Long Island Systems, Applications and Technology Conference (LISAT). :1–5.

The enormous growth of Internet-based traffic exposes corporate networks with a wide variety of vulnerabilities. Intrusive traffics are affecting the normal functionality of network's operation by consuming corporate resources and time. Efficient ways of identifying, protecting, and mitigating from intrusive incidents enhance productivity. As Intrusion Detection System (IDS) is hosted in the network and at the user machine level to oversee the malicious traffic in the network and at the individual computer, it is one of the critical components of a network and host security. Unsupervised anomaly traffic detection techniques are improving over time. This research aims to find an efficient classifier that detects anomaly traffic from NSL-KDD dataset with high accuracy level and minimal error rate by experimenting with five machine learning techniques. Five binary classifiers: Stochastic Gradient Decent, Random Forests, Logistic Regression, Support Vector Machine, and Sequential Model are tested and validated to produce the result. The outcome demonstrates that Random Forest Classifier outperforms the other four classifiers with and without applying the normalization process to the dataset.

2019-12-30
Heydari, Mohammad, Mylonas, Alexios, Katos, Vasilios, Balaguer-Ballester, Emili, Tafreshi, Vahid Heydari Fami, Benkhelifa, Elhadj.  2019.  Uncertainty-Aware Authentication Model for Fog Computing in IoT. 2019 Fourth International Conference on Fog and Mobile Edge Computing (FMEC). :52–59.

Since the term “Fog Computing” has been coined by Cisco Systems in 2012, security and privacy issues of this promising paradigm are still open challenges. Among various security challenges, Access Control is a crucial concern for all cloud computing-like systems (e.g. Fog computing, Mobile edge computing) in the IoT era. Therefore, assigning the precise level of access in such an inherently scalable, heterogeneous and dynamic environment is not easy to perform. This work defines the uncertainty challenge for authentication phase of the access control in fog computing because on one hand fog has a number of characteristics that amplify uncertainty in authentication and on the other hand applying traditional access control models does not result in a flexible and resilient solution. Therefore, we have proposed a novel prediction model based on the extension of Attribute Based Access Control (ABAC) model. Our data-driven model is able to handle uncertainty in authentication. It is also able to consider the mobility of mobile edge devices in order to handle authentication. In doing so, we have built our model using and comparing four supervised classification algorithms namely as Decision Tree, Naïve Bayes, Logistic Regression and Support Vector Machine. Our model can achieve authentication performance with 88.14% accuracy using Logistic Regression.

2019-05-20
Prokofiev, A. O., Smirnova, Y. S., Surov, V. A..  2018.  A method to detect Internet of Things botnets. 2018 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus). :105–108.

The main security problems, typical for the Internet of Things (IoT), as well as the purpose of gaining unauthorized access to the IoT, are considered in this paper. Common characteristics of the most widespread botnets are provided. A method to detect compromised IoT devices included into a botnet is proposed. The method is based on a model of logistic regression. The article describes a developed model of logistic regression which allows to estimate the probability that a device initiating a connection is running a bot. A list of network protocols, used to gain unauthorized access to a device and to receive instructions from common and control (C&C) server, is provided too.

2019-04-05
Bapat, R., Mandya, A., Liu, X., Abraham, B., Brown, D. E., Kang, H., Veeraraghavan, M..  2018.  Identifying Malicious Botnet Traffic Using Logistic Regression. 2018 Systems and Information Engineering Design Symposium (SIEDS). :266-271.

An important source of cyber-attacks is malware, which proliferates in different forms such as botnets. The botnet malware typically looks for vulnerable devices across the Internet, rather than targeting specific individuals, companies or industries. It attempts to infect as many connected devices as possible, using their resources for automated tasks that may cause significant economic and social harm while being hidden to the user and device. Thus, it becomes very difficult to detect such activity. A considerable amount of research has been conducted to detect and prevent botnet infestation. In this paper, we attempt to create a foundation for an anomaly-based intrusion detection system using a statistical learning method to improve network security and reduce human involvement in botnet detection. We focus on identifying the best features to detect botnet activity within network traffic using a lightweight logistic regression model. The network traffic is processed by Bro, a popular network monitoring framework which provides aggregate statistics about the packets exchanged between a source and destination over a certain time interval. These statistics serve as features to a logistic regression model responsible for classifying malicious and benign traffic. Our model is easy to implement and simple to interpret. We characterized and modeled 8 different botnet families separately and as a mixed dataset. Finally, we measured the performance of our model on multiple parameters using F1 score, accuracy and Area Under Curve (AUC).

2019-03-04
Aborisade, O., Anwar, M..  2018.  Classification for Authorship of Tweets by Comparing Logistic Regression and Naive Bayes Classifiers. 2018 IEEE International Conference on Information Reuse and Integration (IRI). :269–276.

At a time when all it takes to open a Twitter account is a mobile phone, the act of authenticating information encountered on social media becomes very complex, especially when we lack measures to verify digital identities in the first place. Because the platform supports anonymity, fake news generated by dubious sources have been observed to travel much faster and farther than real news. Hence, we need valid measures to identify authors of misinformation to avert these consequences. Researchers propose different authorship attribution techniques to approach this kind of problem. However, because tweets are made up of only 280 characters, finding a suitable authorship attribution technique is a challenge. This research aims to classify authors of tweets by comparing machine learning methods like logistic regression and naive Bayes. The processes of this application are fetching of tweets, pre-processing, feature extraction, and developing a machine learning model for classification. This paper illustrates the text classification for authorship process using machine learning techniques. In total, there were 46,895 tweets used as both training and testing data, and unique features specific to Twitter were extracted. Several steps were done in the pre-processing phase, including removal of short texts, removal of stop-words and punctuations, tokenizing and stemming of texts as well. This approach transforms the pre-processed data into a set of feature vector in Python. Logistic regression and naive Bayes algorithms were applied to the set of feature vectors for the training and testing of the classifier. The logistic regression based classifier gave the highest accuracy of 91.1% compared to the naive Bayes classifier with 89.8%.

2019-02-18
Wu, KuanTing, Chou, ShingHua, Chen, ShyhWei, Tsai, ChingTsorng, Yuan, ShyanMing.  2018.  Application of Machine Learning to Identify Counterfeit Website. Proceedings of the 20th International Conference on Information Integration and Web-based Applications & Services. :321–324.
Recent years the prevalence of fraudulent websites has become more severe than before. Fraudulent ecommerce websites that sell counterfeit goods not only cost financial damage to consumers but also have a great impact on Internet industry. Nowadays, there is not an effective way to confront these websites. In this paper, we look forward to achieving three goals: find the characteristics of counterfeit websites, train models for classifying ecommerce websites and provide a service to help consumers distinguish counterfeit websites from legitimate ones.
2018-06-07
Lahrouni, Youssef, Pereira, Caroly, Bensaber, Boucif Amar, Biskri, Ismaïl.  2017.  Using Mathematical Methods Against Denial of Service (DoS) Attacks in VANET. Proceedings of the 15th ACM International Symposium on Mobility Management and Wireless Access. :17–22.

VANET network is a new technology on which future intelligent transport systems are based; its purpose is to develop the vehicular environment and make it more comfortable. In addition, it provides more safety for drivers and cars on the road. Therefore, we have to make this technology as secured as possible against many threats. As VANET is a subclass of MANET, it has inherited many security problems but with a different architecture and DOS attacks are one of them. In this paper, we have focused on DOS attacks that prevent users to receive the right information at the right moment. We have analyzed DOS attacks behavior and effects on the network using different mathematical models in order to find an efficient solution.

2018-03-05
Wang, W., Hussein, N., Gupta, A., Wang, Y..  2017.  A Regression Model Based Approach for Identifying Security Requirements in Open Source Software Development. 2017 IEEE 25th International Requirements Engineering Conference Workshops (REW). :443–446.

There are several security requirements identification methods proposed by researchers in up-front requirements engineering (RE). However, in open source software (OSS) projects, developers use lightweight representation and refine requirements frequently by writing comments. They also tend to discuss security aspect in comments by providing code snippets, attachments, and external resource links. Since most security requirements identification methods in up-front RE are based on textual information retrieval techniques, these methods are not suitable for OSS projects or just-in-time RE. In our study, we propose a new model based on logistic regression to identify security requirements in OSS projects. We used five metrics to build security requirements identification models and tested the performance of these metrics by applying those models to three OSS projects. Our results show that four out of five metrics achieved high performance in intra-project testing.

2018-01-10
Devyatkin, D., Smirnov, I., Ananyeva, M., Kobozeva, M., Chepovskiy, A., Solovyev, F..  2017.  Exploring linguistic features for extremist texts detection (on the material of Russian-speaking illegal texts). 2017 IEEE International Conference on Intelligence and Security Informatics (ISI). :188–190.

In this paper we present results of a research on automatic extremist text detection. For this purpose an experimental dataset in the Russian language was created. According to the Russian legislation we cannot make it publicly available. We compared various classification methods (multinomial naive Bayes, logistic regression, linear SVM, random forest, and gradient boosting) and evaluated the contribution of differentiating features (lexical, semantic and psycholinguistic) to classification quality. The results of experiments show that psycholinguistic and semantic features are promising for extremist text detection.

2017-12-28
Vu, Q. H., Ruta, D., Cen, L..  2017.  An ensemble model with hierarchical decomposition and aggregation for highly scalable and robust classification. 2017 Federated Conference on Computer Science and Information Systems (FedCSIS). :149–152.

This paper introduces an ensemble model that solves the binary classification problem by incorporating the basic Logistic Regression with the two recent advanced paradigms: extreme gradient boosted decision trees (xgboost) and deep learning. To obtain the best result when integrating sub-models, we introduce a solution to split and select sets of features for the sub-model training. In addition to the ensemble model, we propose a flexible robust and highly scalable new scheme for building a composite classifier that tries to simultaneously implement multiple layers of model decomposition and outputs aggregation to maximally reduce both bias and variance (spread) components of classification errors. We demonstrate the power of our ensemble model to solve the problem of predicting the outcome of Hearthstone, a turn-based computer game, based on game state information. Excellent predictive performance of our model has been acknowledged by the second place scored in the final ranking among 188 competing teams.

2017-03-29
Aono, Yoshinori, Hayashi, Takuya, Trieu Phong, Le, Wang, Lihua.  2016.  Scalable and Secure Logistic Regression via Homomorphic Encryption. Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy. :142–144.

Logistic regression is a powerful machine learning tool to classify data. When dealing with sensitive data such as private or medical information, cares are necessary. In this paper, we propose a secure system for protecting the training data in logistic regression via homomorphic encryption. Perhaps surprisingly, despite the non-polynomial tasks of training in logistic regression, we show that only additively homomorphic encryption is needed to build our system. Our system is secure and scalable with the dataset size.

2017-03-08
Leong, F. H..  2015.  Automatic detection of frustration of novice programmers from contextual and keystroke logs. 2015 10th International Conference on Computer Science Education (ICCSE). :373–377.

Novice programmers exhibit a repertoire of affective states over time when they are learning computer programming. The modeling of frustration is important as it informs on the need for pedagogical intervention of the student who may otherwise lose confidence and interest in the learning. In this paper, contextual and keystroke features of the students within a Java tutoring system are used to detect frustration of student within a programming exercise session. As compared to psychological sensors used in other studies, the use of contextual and keystroke logs are less obtrusive and the equipment used (keyboard) is ubiquitous in most learning environment. The technique of logistic regression with lasso regularization is utilized for the modeling to prevent over-fitting. The results showed that a model that uses only contextual and keystroke features achieved a prediction accuracy level of 0.67 and a recall measure of 0.833. Thus, we conclude that it is possible to detect frustration of a student from distilling both the contextual and keystroke logs within the tutoring system with an adequate level of accuracy.