Biblio

List
Filter

Found 32 results

Filters: Keyword is random forests [Clear All Filters]

2022-10-13

Basit, Abdul, Zafar, Maham, Javed, Abdul Rehman, Jalil, Zunera. 2020. A Novel Ensemble Machine Learning Method to Detect Phishing Attack. 2020 IEEE 23rd International Multitopic Conference (INMIC). :1—5.

Currently and particularly with remote working scenarios during COVID-19, phishing attack has become one of the most significant threats faced by internet users, organizations, and service providers. In a phishing attack, the attacker tries to steal client sensitive data (such as login, passwords, and credit card details) using spoofed emails and fake websites. Cybercriminals, hacktivists, and nation-state spy agencies have now got a fertilized ground to deploy their latest innovative phishing attacks. Timely detection of phishing attacks has become most crucial than ever. Machine learning algorithms can be used to accurately detect phishing attacks before a user is harmed. This paper presents a novel ensemble model to detect phishing attacks on the website. We select three machine learning classifiers: Artificial Neural Network (ANN), K-Nearest Neighbors (KNN), and Decision Tree (C4.5) to use in an ensemble method with Random Forest Classifier (RFC). This ensemble method effectively detects website phishing attacks with better accuracy than existing studies. Experimental results demonstrate that the ensemble of KNN and RFC detects phishing attacks with 97.33% accuracy.

2022-08-03

de Biase, Maria Stella, Marulli, Fiammetta, Verde, Laura, Marrone, Stefano. 2021. Improving Classification Trustworthiness in Random Forests. 2021 IEEE International Conference on Cyber Security and Resilience (CSR). :563—568.

Machine learning algorithms are becoming more and more widespread in industrial as well as in societal settings. This diffusion is starting to become a critical aspect of new software-intensive applications due to the need of fast reactions to changes, even if temporary, in data. This paper investigates on the improvement of reliability in the Machine Learning based classification by extending Random Forests with Bayesian Network models. Such models, combined with a mechanism able to adjust the reputation level of single learners, may improve the overall classification trustworthiness. A small example taken from the healthcare domain is presented to demonstrate the proposed approach.

2022-04-18

Aivatoglou, Georgios, Anastasiadis, Mike, Spanos, Georgios, Voulgaridis, Antonis, Votis, Konstantinos, Tzovaras, Dimitrios. 2021. A Tree-Based Machine Learning Methodology to Automatically Classify Software Vulnerabilities. 2021 IEEE International Conference on Cyber Security and Resilience (CSR). :312–317.

Software vulnerabilities have become a major problem for the security analysts, since the number of new vulnerabilities is constantly growing. Thus, there was a need for a categorization system, in order to group and handle these vulnerabilities in a more efficient way. Hence, the MITRE corporation introduced the Common Weakness Enumeration that is a list of the most common software and hardware vulnerabilities. However, the manual task of understanding and analyzing new vulnerabilities by security experts, is a very slow and exhausting process. For this reason, a new automated classification methodology is introduced in this paper, based on the vulnerability textual descriptions from National Vulnerability Database. The proposed methodology, combines textual analysis and tree-based machine learning techniques in order to classify vulnerabilities automatically. The results of the experiments showed that the proposed methodology performed pretty well achieving an overall accuracy close to 80%.

2022-02-07

Khalifa, Marwa Mohammed, Ucan, Osman Nuri, Ali Alheeti, Khattab M.. 2021. New Intrusion Detection System to Protect MANET Networks Employing Machine Learning Techniques. 2021 International Conference of Modern Trends in Information and Communication Technology Industry (MTICTI). :1–6.

The Intrusion Detection System (IDS) is one of the technologies available to protect mobile ad hoc networks. The system monitors the network and detects intrusion from malicious nodes, aiming at passive (eavesdropping) or positive attack to disrupt the network. This paper proposes a new Intrusion detection system using three Machine Learning (ML) techniques. The ML techniques were Random Forest (RF), support vector machines (SVM), and Naïve Bayes(NB) were used to classify nodes in MANET. The data set was generated by the simulator network simulator-2 (NS-2). The routing protocol was used is Dynamic Source Routing (DSR). The type of IDS used is a Network Intrusion Detection System (NIDS). The dataset was pre-processed, then split into two subsets, 67% for training and 33% for testing employing Python Version 3.8.8. Obtaining good results for RF, SVM and NB when applied randomly selected features in the trial and error method from the dataset to improve the performance of the IDS and reduce time spent for training and testing. The system showed promising results, especially with RF, where the accuracy rate reached 100%.

2022-01-10

Al-Ameer, Ali, AL-Sunni, Fouad. 2021. A Methodology for Securities and Cryptocurrency Trading Using Exploratory Data Analysis and Artificial Intelligence. 2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA). :54–61.

This paper discusses securities and cryptocurrency trading using artificial intelligence (AI) in the sense that it focuses on performing Exploratory Data Analysis (EDA) on selected technical indicators before proceeding to modelling, and then to develop more practical models by introducing new reward loss function that maximizes the returns during training phase. The results of EDA reveal that the complex patterns within the data can be better captured by discriminative classification models and this was endorsed by performing back-testing on two securities using Artificial Neural Network (ANN) and Random Forests (RF) as discriminative models against their counterpart Na\"ıve Bayes as a generative model. To enhance the learning process, the new reward loss function is utilized to retrain the ANN with testing on AAPL, IBM, BRENT CRUDE and BTC using auto-trading strategy that serves as the intelligent unit, and the results indicate this loss superiorly outperforms the conventional cross-entropy used in predictive models. The overall results of this work suggest that there should be larger focus on EDA and more practical losses in the research of machine learning modelling for stock market prediction applications.

2021-11-08

Afroz, Sabrina, Ariful Islam, S.M, Nawer Rafa, Samin, Islam, Maheen. 2020. A Two Layer Machine Learning System for Intrusion Detection Based on Random Forest and Support Vector Machine. 2020 IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE). :300–303.

Unauthorized access or intrusion is a massive threatening issue in the modern era. This study focuses on designing a model for an ideal intrusion detection system capable of defending a network by alerting the admins upon detecting any sorts of malicious activities. The study proposes a two layered anomaly-based detection model that uses filter co-relation method for dimensionality reduction along with Random forest and Support Vector Machine as its classifiers. It achieved a very good detection rate against all sorts of attacks including a low rate of false alarms as well. The contribution of this study is that it could be of a major help to the computer scientists designing good intrusion detection systems to keep an industry or organization safe from the cyber threats as it has achieved the desired qualities of a functional IDS model.

Ma, Zhongrui, Yuanyuan, Huang, Lu, Jiazhong. 2020. Trojan Traffic Detection Based on Machine Learning. 2020 17th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP). :157–160.

At present, most Trojan detection methods are based on the features of host and code. Such methods have certain limitations and lag. This paper analyzes the network behavior features and network traffic of several typical Trojans such as Zeus and Weasel, and proposes a Trojan traffic detection algorithm based on machine learning. First, model different machine learning algorithms and use Random Forest algorithm to extract features for Trojan behavior and communication features. Then identify and detect Trojans' traffic. The accuracy is as high as 95.1%. Comparing the detection of different machine learning algorithms, experiments show that our algorithm has higher accuracy, which is helpful and useful for identifying Trojan.

2021-09-30

Yao, Jiaqi, Zhang, Ying, Mao, Zhiming, Li, Sen, Ge, Minghui, Chen, Xin. 2020. On-line Detection and Localization of DoS Attacks in NoC. 2020 IEEE 9th Joint International Information Technology and Artificial Intelligence Conference (ITAIC). 9:173–178.

Nowadays, the Network on Chip (NoC) is widely adopted by multi-core System on Chip (SoC) to meet its communication needs. With the gradual popularization of the Internet of Things (IoT), the application of NoC is increasing. Due to its distribution characteristics on the chip, NoC has gradually become the focus of potential security attacks. Denial of service (DoS) is a typical attack and it is caused by malicious intellectual property (IP) core with unnecessary data packets causing communication congestion and performance degradation. In this article, we propose a novel approach to detect DoS attacks on-line based on random forest algorithm, and detect the router where the attack enters the sensitive communication path. This method targets malicious third-party vendors to implant a DoS Hardware Trojan into the NoC. The data set is generated based on the behavior of multi-core routers triggered by normal and Hardware Trojans. The detection accuracy of the proposed scheme is in the range of 93% to 94%.

2021-09-21

Barr, Joseph R., Shaw, Peter, Abu-Khzam, Faisal N., Yu, Sheng, Yin, Heng, Thatcher, Tyler. 2020. Combinatorial Code Classification Amp; Vulnerability Rating. 2020 Second International Conference on Transdisciplinary AI (TransAI). :80–83.

Empirical analysis of source code of Android Fluoride Bluetooth stack demonstrates a novel approach of classification of source code and rating for vulnerability. A workflow that combines deep learning and combinatorial techniques with a straightforward random forest regression is presented. Two kinds of embedding are used: code2vec and LSTM, resulting in a distance matrix that is interpreted as a (combinatorial) graph whose vertices represent code components, functions and methods. Cluster Editing is then applied to partition the vertex set of the graph into subsets representing nearly complete subgraphs. Finally, the vectors representing the components are used as features to model the components for vulnerability risk.

2021-09-07

Kuchlous, Sahil, Kadaba, Madhura. 2020. Short Text Intent Classification for Conversational Agents. 2020 IEEE 17th India Council International Conference (INDICON). :1–4.

Intent classification is an important and relevant area of research in artificial intelligence and machine learning, with applications ranging from marketing and product design to intelligent communication. This paper explores the performance of various models and techniques for short text intent classification in the context of chatbots. The problem was explored for use within the mental wellness and therapy chatbot application, Wysa, to give improved responses to free-text user input. The authors looked at classifying text samples in-to 4 categories - assertions, refutations, clarifiers and transitions. For this, the suitability of the following techniques was evaluated: count vectors, TF-IDF, sentence embeddings and n-grams, as well as modifications of the same. Each technique was used to train a number of state-of-the-art classifiers, and the results have been compiled and presented. This is the first documented implementation of Arora's modification to sentence embeddings for real world use. It also introduces a technique to generate custom stop words that gave a significant gain in performance (10 percentage points). The best pipeline, using these techniques together, gave an accuracy of 95 percent.

2021-05-13

Tong, Zhongkai, Zhu, Ziyuan, Wang, Zhanpeng, Wang, Limin, Zhang, Yusha, Liu, Yuxin. 2020. Cache side-channel attacks detection based on machine learning. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :919—926.

Security has always been one of the main concerns in the field of computer architecture and cloud computing. Cache-based side-channel attacks pose a threat to almost all existing architectures and cloud computing. Especially in the public cloud, the cache is shared among multiple tenants, and cache attacks can make good use of this to extract information. Cache side-channel attacks are a problem to be solved for security, in which how to accurately detect cache side-channel attacks has been a research hotspot. Because the cache side-channel attack does not require the attacker to physically contact the target device and does not need additional devices to obtain the side channel information, the cache-side channel attack is efficient and hidden, which poses a great threat to the security of cryptographic algorithms. Based on the AES algorithm, this paper uses hardware performance counters to obtain the features of different cache events under Flush + Reload, Prime + Probe, and Flush + Flush attacks. Firstly, the random forest algorithm is used to filter the cache features, and then the support vector machine algorithm is used to model the system. Finally, high detection accuracy is achieved under different system loads. The detection accuracy of the system is 99.92% when there is no load, the detection accuracy is 99.85% under the average load, and the detection accuracy under full load is 96.57%.

2021-03-30

Tai, J., Alsmadi, I., Zhang, Y., Qiao, F.. 2020. Machine Learning Methods for Anomaly Detection in Industrial Control Systems. 2020 IEEE International Conference on Big Data (Big Data). :2333—2339.

This paper examines multiple machine learning models to find the model that best indicates anomalous activity in an industrial control system that is under a software-based attack. The researched machine learning models are Random Forest, Gradient Boosting Machine, Artificial Neural Network, and Recurrent Neural Network classifiers built-in Python and tested against the HIL-based Augmented ICS dataset. Although the results showed that Random Forest, Gradient Boosting Machine, Artificial Neural Network, and Long Short-Term Memory classification models have great potential for anomaly detection in industrial control systems, we found that Random Forest with tuned hyperparameters slightly outperformed the other models.

Lin, T.-H., Jiang, J.-R.. 2020. Anomaly Detection with Autoencoder and Random Forest. 2020 International Computer Symposium (ICS). :96—99.

This paper proposes AERFAD, an anomaly detection method based on the autoencoder and the random forest, for solving the credit card fraud detection problem. The proposed AERFAD first utilizes the autoencoder to reduce the dimensionality of data and then uses the random forest to classify data as anomalous or normal. Large numbers of credit card transaction data of European cardholders are applied to AEFRAD to detect possible frauds for the sake of performance evaluation. When compared with related methods, AERFAD has relatively excellent performance in terms of the accuracy, true positive rate, true negative rate, and Matthews correlation coefficient.

2021-03-04

Wang, Y., Wang, Z., Xie, Z., Zhao, N., Chen, J., Zhang, W., Sui, K., Pei, D.. 2020. Practical and White-Box Anomaly Detection through Unsupervised and Active Learning. 2020 29th International Conference on Computer Communications and Networks (ICCCN). :1—9.

To ensure quality of service and user experience, large Internet companies often monitor various Key Performance Indicators (KPIs) of their systems so that they can detect anomalies and identify failure in real time. However, due to a large number of various KPIs and the lack of high-quality labels, existing KPI anomaly detection approaches either perform well only on certain types of KPIs or consume excessive resources. Therefore, to realize generic and practical KPI anomaly detection in the real world, we propose a KPI anomaly detection framework named iRRCF-Active, which contains an unsupervised and white-box anomaly detector based on Robust Random Cut Forest (RRCF), and an active learning component. Specifically, we novelly propose an improved RRCF (iRRCF) algorithm to overcome the drawbacks of applying original RRCF in KPI anomaly detection. Besides, we also incorporate the idea of active learning to make our model benefit from high-quality labels given by experienced operators. We conduct extensive experiments on a large-scale public dataset and a private dataset collected from a large commercial bank. The experimental resulta demonstrate that iRRCF-Active performs better than existing traditional statistical methods, unsupervised learning methods and supervised learning methods. Besides, each component in iRRCF-Active has also been demonstrated to be effective and indispensable.

2021-02-16

Hongbin, Z., Wei, W., Wengdong, S.. 2020. Safety and Damage Assessment Method of Transmission Line Tower in Goaf Based on Artificial Intelligence. 2020 IEEE/IAS Industrial and Commercial Power System Asia (I CPS Asia). :1474—1479.

The transmission line tower is affected by the surface subsidence in the mined out area of coal mine, which will appear the phenomenon of subsidence, inclination and even tower collapse, threatening the operation safety of the transmission line tower in the mined out area. Therefore, a Safety and Damage Assessment Method of Transmission Line Tower in Goaf Based on Artificial Intelligence is proposed. Firstly, the geometric model of the coal seam in the goaf and the structural reliability model of the transmission line tower are constructed to evaluate the safety. Then, the random forest algorithm in artificial intelligence is used to evaluate the damage of the tower, so as to take protective measures in time. Finally, a finite element simulation model of tower foundation interaction is built, and its safety (force) and damage identification are experimentally analyzed. The results show that the proposed method can ensure high accuracy of damage assessment and reliable judgment of transmission line tower safety within the allowable error.

Başkaya, D., Samet, R.. 2020. DDoS Attacks Detection by Using Machine Learning Methods on Online Systems. 2020 5th International Conference on Computer Science and Engineering (UBMK). :52—57.

DDoS attacks impose serious threats to many large or small organizations; therefore DDoS attacks have to be detected as soon as possible. In this study, a methodology to detect DDoS attacks is proposed and implemented on online systems. In the scope of the proposed methodology, Multi Layer Perceptron (MLP), Random Forest (RF), K-Nearest Neighbor (KNN), C-Support Vector Machine (SVC) machine learning methods are used with scaling and feature reduction preprocessing methods and then effects of preprocesses on detection accuracy rates of HTTP (Hypertext Transfer Protocol) flood, TCP SYN (Transport Control Protocol Synchronize) flood, UDP (User Datagram Protocol) flood and ICMP (Internet Control Message Protocol) flood DDoS attacks are analyzed. Obtained results showed that DDoS attacks can be detected with high accuracy of 99.2%.

Nandi, S., Phadikar, S., Majumder, K.. 2020. Detection of DDoS Attack and Classification Using a Hybrid Approach. 2020 Third ISEA Conference on Security and Privacy (ISEA-ISAP). :41—47.

In the area of cloud security, detection of DDoS attack is a challenging task such that legitimate users use the cloud resources properly. So in this paper, detection and classification of the attacking packets and normal packets are done by using various machine learning classifiers. We have selected the most relevant features from NSL KDD dataset using five (Information gain, gain ratio, chi-squared, ReliefF, and symmetrical uncertainty) commonly used feature selection methods. Now from the entire selected feature set, the most important features are selected by applying our hybrid feature selection method. Since all the anomalous instances of the dataset do not belong to DDoS category so we have separated only the DDoS packets from the dataset using the selected features. Finally, the dataset has been prepared and named as KDD DDoS dataset by considering the selected DDoS packets and normal packets. This KDD DDoS dataset has been discretized using discretize tool in weka for getting better performance. Finally, this discretize dataset has been applied on some commonly used (Naive Bayes, Bayes Net, Decision Table, J48 and Random Forest) classifiers for determining the detection rate of the classifiers. 10 fold cross validation has been used here for measuring the robustness of the system. To measure the efficiency of our hybrid feature selection method, we have also applied the same set of classifiers on the NSL KDD dataset, where it gives the best anomaly detection rate of 99.72% and average detection rate 98.47% similarly, we have applied the same set of classifiers on NSL DDoS dataset and obtain the average DDoS detection of 99.01% and the best DDoS detection rate of 99.86%. In order to compare the performance of our proposed hybrid method, we have also applied the existing feature selection methods and measured the detection rate using the same set of classifiers. Finally, we have seen that our hybrid approach for detecting the DDoS attack gives the best detection rate compared to some existing methods.

2021-02-10

Banerjee, R., Baksi, A., Singh, N., Bishnu, S. K.. 2020. Detection of XSS in web applications using Machine Learning Classifiers. 2020 4th International Conference on Electronics, Materials Engineering Nano-Technology (IEMENTech). :1—5.

Considering the amount of time we spend on the internet, web pages have evolved over a period of time with rapid progression and momentum. With such advancement, we find ourselves fronting a few hostile ideologies, breaching the security levels of webpages as such. The most hazardous of them all is XSS, known as Cross-Site Scripting, is one of the attacks which frequently occur in website-based applications. Cross-Site Scripting (XSS) attacks happen when malicious data enters a web application through an untrusted source. The spam attacks happen in the form of Wall posts, News feed, Message spam and mostly when a user is open to download content of webpages. This paper investigates the use of machine learning to build classifiers to allow the detection of XSS. Establishing our approach, we target the detection modus operandi of XSS attack via two features: URLs and JavaScript. To predict the level of XSS threat, we will be using four machine learning algorithms (SVM, KNN, Random forest and Logistic Regression). Proposing these classified algorithms, webpages will be branded as malicious or benign. After assessing and calculating the dataset features, we concluded that the Random Forest Classifier performed most accurately with the lowest False Positive Rate of 0.34. This precision will ensure a method much efficient to evaluate threatening XSS for the smooth functioning of the system.

2021-01-22

Mani, G., Pasumarti, V., Bhargava, B., Vora, F. T., MacDonald, J., King, J., Kobes, J.. 2020. DeCrypto Pro: Deep Learning Based Cryptomining Malware Detection Using Performance Counters. 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS). :109—118.

Autonomy in cybersystems depends on their ability to be self-aware by understanding the intent of services and applications that are running on those systems. In case of mission-critical cybersystems that are deployed in dynamic and unpredictable environments, the newly integrated unknown applications or services can either be benign and essential for the mission or they can be cyberattacks. In some cases, these cyberattacks are evasive Advanced Persistent Threats (APTs) where the attackers remain undetected for reconnaissance in order to ascertain system features for an attack e.g. Trojan Laziok. In other cases, the attackers can use the system only for computing e.g. cryptomining malware. APTs such as cryptomining malware neither disrupt normal system functionalities nor trigger any warning signs because they simply perform bitwise and cryptographic operations as any other benign compression or encoding application. Thus, it is difficult for defense mechanisms such as antivirus applications to detect these attacks. In this paper, we propose an Operating Context profiling system based on deep neural networks-Long Short-Term Memory (LSTM) networks-using Windows Performance Counters data for detecting these evasive cryptomining applications. In addition, we propose Deep Cryptomining Profiler (DeCrypto Pro), a detection system with a novel model selection framework containing a utility function that can select a classification model for behavior profiling from both the light-weight machine learning models (Random Forest and k-Nearest Neighbors) and a deep learning model (LSTM), depending on available computing resources. Given data from performance counters, we show that individual models perform with high accuracy and can be trained with limited training data. We also show that the DeCrypto Profiler framework reduces the use of computational resources and accurately detects cryptomining applications by selecting an appropriate model, given the constraints such as data sample size and system configuration.

2020-12-14

Arjoune, Y., Salahdine, F., Islam, M. S., Ghribi, E., Kaabouch, N.. 2020. A Novel Jamming Attacks Detection Approach Based on Machine Learning for Wireless Communication. 2020 International Conference on Information Networking (ICOIN). :459–464.

Jamming attacks target a wireless network creating an unwanted denial of service. 5G is vulnerable to these attacks despite its resilience prompted by the use of millimeter wave bands. Over the last decade, several types of jamming detection techniques have been proposed, including fuzzy logic, game theory, channel surfing, and time series. Most of these techniques are inefficient in detecting smart jammers. Thus, there is a great need for efficient and fast jamming detection techniques with high accuracy. In this paper, we compare the efficiency of several machine learning models in detecting jamming signals. We investigated the types of signal features that identify jamming signals, and generated a large dataset using these parameters. Using this dataset, the machine learning algorithms were trained, evaluated, and tested. These algorithms are random forest, support vector machine, and neural network. The performance of these algorithms was evaluated and compared using the probability of detection, probability of false alarm, probability of miss detection, and accuracy. The simulation results show that jamming detection based random forest algorithm can detect jammers with a high accuracy, high detection probability and low probability of false alarm.

2020-12-11

Slawinski, M., Wortman, A.. 2019. Applications of Graph Integration to Function Comparison and Malware Classification. 2019 4th International Conference on System Reliability and Safety (ICSRS). :16—24.

We classify .NET files as either benign or malicious by examining directed graphs derived from the set of functions comprising the given file. Each graph is viewed probabilistically as a Markov chain where each node represents a code block of the corresponding function, and by computing the PageRank vector (Perron vector with transport), a probability measure can be defined over the nodes of the given graph. Each graph is vectorized by computing Lebesgue antiderivatives of hand-engineered functions defined on the vertex set of the given graph against the PageRank measure. Files are subsequently vectorized by aggregating the set of vectors corresponding to the set of graphs resulting from decompiling the given file. The result is a fast, intuitive, and easy-to-compute glass-box vectorization scheme, which can be leveraged for training a standalone classifier or to augment an existing feature space. We refer to this vectorization technique as PageRank Measure Integration Vectorization (PMIV). We demonstrate the efficacy of PMIV by training a vanilla random forest on 2.5 million samples of decompiled. NET, evenly split between benign and malicious, from our in-house corpus and compare this model to a baseline model which leverages a text-only feature space. The median time needed for decompilation and scoring was 24ms. 11Code available at https://github.com/gtownrocks/grafuple.

2020-12-01

Harris, L., Grzes, M.. 2019. Comparing Explanations between Random Forests and Artificial Neural Networks. 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). :2978—2985.

The decisions made by machines are increasingly comparable in predictive performance to those made by humans, but these decision making processes are often concealed as black boxes. Additional techniques are required to extract understanding, and one such category are explanation methods. This research compares the explanations of two popular forms of artificial intelligence; neural networks and random forests. Researchers in either field often have divided opinions on transparency, and comparing explanations may discover similar ground truths between models. Similarity can help to encourage trust in predictive accuracy alongside transparent structure and unite the respective research fields. This research explores a variety of simulated and real-world datasets that ensure fair applicability to both learning algorithms. A new heuristic explanation method that extends an existing technique is introduced, and our results show that this is somewhat similar to the other methods examined whilst also offering an alternative perspective towards least-important features.

2020-08-28

Huang, Angus F.M., Chi-Wei, Yang, Tai, Hsiao-Chi, Chuan, Yang, Huang, Jay J.C., Liao, Yu-Han. 2019. Suspicious Network Event Recognition Using Modified Stacking Ensemble Machine Learning. 2019 IEEE International Conference on Big Data (Big Data). :5873—5880.

This study aims to detect genuine suspicious events and false alarms within a dataset of network traffic alerts. The rapid development of cloud computing and artificial intelligence-oriented automatic services have enabled a large amount of data and information to be transmitted among network nodes. However, the amount of cyber-threats, cyberattacks, and network intrusions have increased in various domains of network environments. Based on the fields of data science and machine learning, this paper proposes a series of solutions involving data preprocessing, exploratory data analysis, new features creation, features selection, ensemble learning, models construction, and verification to identify suspicious network events. This paper proposes a modified form of stacking ensemble machine learning which includes AdaBoost, Neural Networks, Random Forest, LightGBM, and Extremely Randomised Trees (Extra Trees) to realise a high-performance classification. A suspicious network event recognition dataset for a security operations centre, which uses real network log observations from the 2019 IEEE BigData Cup Challenge, is used as an experimental dataset. This paper investigates the possibility of integrating big-data analytics, machine learning, and data science to improve intelligent cybersecurity.

2020-07-09

Nisha, D, Sivaraman, E, Honnavalli, Prasad B. 2019. Predicting and Preventing Malware in Machine Learning Model. 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT). :1—7.

Machine learning is a major area in artificial intelligence, which enables computer to learn itself explicitly without programming. As machine learning is widely used in making decision automatically, attackers have strong intention to manipulate the prediction generated my machine learning model. In this paper we study about the different types of attacks and its countermeasures on machine learning model. By research we found that there are many security threats in various algorithms such as K-nearest-neighbors (KNN) classifier, random forest, AdaBoost, support vector machine (SVM), decision tree, we revisit existing security threads and check what are the possible countermeasures during the training and prediction phase of machine learning model. In machine learning model there are 2 types of attacks that is causative attack which occurs during the training phase and exploratory attack which occurs during the prediction phase, we will also discuss about the countermeasures on machine learning model, the countermeasures are data sanitization, algorithm robustness enhancement, and privacy preserving techniques.

2020-05-22

Yan, Donghui, Wang, Yingjie, Wang, Jin, Wang, Honggang, Li, Zhenpeng. 2018. K-nearest Neighbor Search by Random Projection Forests. 2018 IEEE International Conference on Big Data (Big Data). :4775—4781.

K-nearest neighbor (kNN) search has wide applications in many areas, including data mining, machine learning, statistics and many applied domains. Inspired by the success of ensemble methods and the flexibility of tree-based methodology, we propose random projection forests, rpForests, for kNN search. rpForests finds kNNs by aggregating results from an ensemble of random projection trees with each constructed recursively through a series of carefully chosen random projections. rpForests achieves a remarkable accuracy in terms of fast decay in the missing rate of kNNs and that of discrepancy in the kNN distances. rpForests has a very low computational complexity. The ensemble nature of rpForests makes it easily run in parallel on multicore or clustered computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights by showing the exponential decay of the probability that neighboring points would be separated by ensemble random projection trees when the ensemble size increases. Our theory can be used to refine the choice of random projections in the growth of trees, and experiments show that the effect is remarkable.