Visible to the public Biblio

Filters: Keyword is random forests  [Clear All Filters]
2020-04-10
Newaz, AKM Iqtidar, Sikder, Amit Kumar, Rahman, Mohammad Ashiqur, Uluagac, A. Selcuk.  2019.  HealthGuard: A Machine Learning-Based Security Framework for Smart Healthcare Systems. 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS). :389—396.
The integration of Internet-of-Things and pervasive computing in medical devices have made the modern healthcare system “smart.” Today, the function of the healthcare system is not limited to treat the patients only. With the help of implantable medical devices and wearables, Smart Healthcare System (SHS) can continuously monitor different vital signs of a patient and automatically detect and prevent critical medical conditions. However, these increasing functionalities of SHS raise several security concerns and attackers can exploit the SHS in numerous ways: they can impede normal function of the SHS, inject false data to change vital signs, and tamper a medical device to change the outcome of a medical emergency. In this paper, we propose HealthGuard, a novel machine learning-based security framework to detect malicious activities in a SHS. HealthGuard observes the vital signs of different connected devices of a SHS and correlates the vitals to understand the changes in body functions of the patient to distinguish benign and malicious activities. HealthGuard utilizes four different machine learning-based detection techniques (Artificial Neural Network, Decision Tree, Random Forest, k-Nearest Neighbor) to detect malicious activities in a SHS. We trained HealthGuard with data collected for eight different smart medical devices for twelve benign events including seven normal user activities and five disease-affected events. Furthermore, we evaluated the performance of HealthGuard against three different malicious threats. Our extensive evaluation shows that HealthGuard is an effective security framework for SHS with an accuracy of 91 % and an F1 score of 90 %.
2020-02-26
Rahman, Obaid, Quraishi, Mohammad Ali Gauhar, Lung, Chung-Horng.  2019.  DDoS Attacks Detection and Mitigation in SDN Using Machine Learning. 2019 IEEE World Congress on Services (SERVICES). 2642-939X:184–189.

Software Defined Networking (SDN) is very popular due to the benefits it provides such as scalability, flexibility, monitoring, and ease of innovation. However, it needs to be properly protected from security threats. One major attack that plagues the SDN network is the distributed denial-of-service (DDoS) attack. There are several approaches to prevent the DDoS attack in an SDN network. We have evaluated a few machine learning techniques, i.e., J48, Random Forest (RF), Support Vector Machine (SVM), and K-Nearest Neighbors (K-NN), to detect and block the DDoS attack in an SDN network. The evaluation process involved training and selecting the best model for the proposed network and applying it in a mitigation and prevention script to detect and mitigate attacks. The results showed that J48 performs better than the other evaluated algorithms, especially in terms of training and testing time.

2020-01-20
Yihunie, Fekadu, Abdelfattah, Eman, Regmi, Amish.  2019.  Applying Machine Learning to Anomaly-Based Intrusion Detection Systems. 2019 IEEE Long Island Systems, Applications and Technology Conference (LISAT). :1–5.

The enormous growth of Internet-based traffic exposes corporate networks with a wide variety of vulnerabilities. Intrusive traffics are affecting the normal functionality of network's operation by consuming corporate resources and time. Efficient ways of identifying, protecting, and mitigating from intrusive incidents enhance productivity. As Intrusion Detection System (IDS) is hosted in the network and at the user machine level to oversee the malicious traffic in the network and at the individual computer, it is one of the critical components of a network and host security. Unsupervised anomaly traffic detection techniques are improving over time. This research aims to find an efficient classifier that detects anomaly traffic from NSL-KDD dataset with high accuracy level and minimal error rate by experimenting with five machine learning techniques. Five binary classifiers: Stochastic Gradient Decent, Random Forests, Logistic Regression, Support Vector Machine, and Sequential Model are tested and validated to produce the result. The outcome demonstrates that Random Forest Classifier outperforms the other four classifiers with and without applying the normalization process to the dataset.

2019-11-26
Zabihimayvan, Mahdieh, Doran, Derek.  2019.  Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection. 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). :1-6.

Phishing as one of the most well-known cybercrime activities is a deception of online users to steal their personal or confidential information by impersonating a legitimate website. Several machine learning-based strategies have been proposed to detect phishing websites. These techniques are dependent on the features extracted from the website samples. However, few studies have actually considered efficient feature selection for detecting phishing attacks. In this work, we investigate an agreement on the definitive features which should be used in phishing detection. We apply Fuzzy Rough Set (FRS) theory as a tool to select most effective features from three benchmarked data sets. The selected features are fed into three often used classifiers for phishing detection. To evaluate the FRS feature selection in developing a generalizable phishing detection, the classifiers are trained by a separate out-of-sample data set of 14,000 website samples. The maximum F-measure gained by FRS feature selection is 95% using Random Forest classification. Also, there are 9 universal features selected by FRS over all the three data sets. The F-measure value using this universal feature set is approximately 93% which is a comparable result in contrast to the FRS performance. Since the universal feature set contains no features from third-part services, this finding implies that with no inquiry from external sources, we can gain a faster phishing detection which is also robust toward zero-day attacks.

2019-02-18
Fukushima, Keishiro, Nakamura, Toru, Ikeda, Daisuke, Kiyomoto, Shinsaku.  2018.  Challenges in Classifying Privacy Policies by Machine Learning with Word-based Features. Proceedings of the 2Nd International Conference on Cryptography, Security and Privacy. :62–66.

In this paper, we discuss challenges when we try to automatically classify privacy policies using machine learning with words as the features. Since it is difficult for general public to understand privacy policies, it is necessary to support them to do that. To this end, the authors believe that machine learning is one of the promising ways because users can grasp the meaning of policies through outputs by a machine learning algorithm. Our final goal is to develop a system which automatically translates privacy policies into privacy labels [1]. Toward this goal, we classify sentences in privacy policies with category labels, using popular machine learning algorithms, such as a naive Bayes classifier.We choose these algorithms because we could use trained classifiers to evaluate keywords appropriate for privacy labels. Therefore, we adopt words as the features of those algorithms. Experimental results show about 85% accuracy. We think that much higher accuracy is necessary to achieve our final goal. By changing learning settings, we identified one reason of low accuracies such that privacy policies include many sentences which are not direct description of information about categories. It seems that such sentences are redundant but maybe they are essential in case of legal documents in order to prevent misinterpreting. Thus, it is important for machine learning algorithms to handle these redundant sentences appropriately.

2017-08-22
Buczak, Anna L., Hanke, Paul A., Cancro, George J., Toma, Michael K., Watkins, Lanier A., Chavis, Jeffrey S..  2016.  Detection of Tunnels in PCAP Data by Random Forests. Proceedings of the 11th Annual Cyber and Information Security Research Conference. :16:1–16:4.

This paper describes an approach for detecting the presence of domain name system (DNS) tunnels in network traffic. DNS tunneling is a common technique hackers use to establish command and control nodes and to exfiltrate data from networks. To generate the training data sufficient to build models to detect DNS tunneling activity, a penetration testing effort was employed. We extracted features from this data and trained random forest classifiers to distinguish normal DNS activity from tunneling activity. The classifiers successfully detected the presence of tunnels we trained on, and four other types of tunnels that were not a part of the training set.

2015-04-30
El Masri, A., Wechsler, H., Likarish, P., Kang, B.B..  2014.  Identifying users with application-specific command streams. Privacy, Security and Trust (PST), 2014 Twelfth Annual International Conference on. :232-238.

This paper proposes and describes an active authentication model based on user profiles built from user-issued commands when interacting with GUI-based application. Previous behavioral models derived from user issued commands were limited to analyzing the user's interaction with the *Nix (Linux or Unix) command shell program. Human-computer interaction (HCI) research has explored the idea of building users profiles based on their behavioral patterns when interacting with such graphical interfaces. It did so by analyzing the user's keystroke and/or mouse dynamics. However, none had explored the idea of creating profiles by capturing users' usage characteristics when interacting with a specific application beyond how a user strikes the keyboard or moves the mouse across the screen. We obtain and utilize a dataset of user command streams collected from working with Microsoft (MS) Word to serve as a test bed. User profiles are first built using MS Word commands and identification takes place using machine learning algorithms. Best performance in terms of both accuracy and Area under the Curve (AUC) for Receiver Operating Characteristic (ROC) curve is reported using Random Forests (RF) and AdaBoost with random forests.