Biblio

List
Filter

Found 1057 results

Filters: Keyword is machine learning [Clear All Filters]

2022-12-01

Fujita, Koji, Shibahara, Toshiki, Chiba, Daiki, Akiyama, Mitsuaki, Uchida, Masato. 2022. Objection!: Identifying Misclassified Malicious Activities with XAI. ICC 2022 - IEEE International Conference on Communications. :2065—2070.

Many studies have been conducted to detect various malicious activities in cyberspace using classifiers built by machine learning. However, it is natural for any classifier to make mistakes, and hence, human verification is necessary. One method to address this issue is eXplainable AI (XAI), which provides a reason for the classification result. However, when the number of classification results to be verified is large, it is not realistic to check the output of the XAI for all cases. In addition, it is sometimes difficult to interpret the output of XAI. In this study, we propose a machine learning model called classification verifier that verifies the classification results by using the output of XAI as a feature and raises objections when there is doubt about the reliability of the classification results. The results of experiments on malicious website detection and malware detection show that the proposed classification verifier can efficiently identify misclassified malicious activities.

Kamhoua, Georges, Bandara, Eranga, Foytik, Peter, Aggarwal, Priyanka, Shetty, Sachin. 2021. Resilient and Verifiable Federated Learning against Byzantine Colluding Attacks. 2021 Third IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA). :31–40.

Federated Learning (FL) is a multiparty learning computing approach that can aid privacy-preservation machine learning. However, FL has several potential security and privacy threats. First, the existing FL requires a central coordinator for the learning process which brings a single point of failure and trust issues for the shared trained model. Second, during the learning process, intentionally unreliable model updates performed by Byzantine colluding parties can lower the quality and convergence of the shared ML models. Therefore, discovering verifiable local model updates (i.e., integrity or correctness) and trusted parties in FL becomes crucial. In this paper, we propose a resilient and verifiable FL algorithm based on a reputation scheme to cope with unreliable parties. We develop a selection algorithm for task publisher and blockchain-based multiparty learning architecture approach where local model updates are securely exchanged and verified without the central party. We also proposed a novel auditing scheme to ensure our proposed approach is resilient up to 50% Byzantine colluding attack in a malicious scenario.

Thapaliya, Bipana, Mursi, Khalid T., Zhuang, Yu. 2021. Machine Learning-based Vulnerability Study of Interpose PUFs as Security Primitives for IoT Networks. 2021 IEEE International Conference on Networking, Architecture and Storage (NAS). :1–7.

Security is of importance for communication networks, and many network nodes, like sensors and IoT devices, are resource-constrained. Physical Unclonable Functions (PUFs) leverage physical variations of the integrated circuits to produce responses unique to individual circuits and have the potential for delivering security for low-cost networks. But before a PUF can be adopted for security applications, all security vulnerabilities must be discovered. Recently, a new PUF known as Interpose PUF (IPUF) was proposed, which was tested to be secure against reliability-based modeling attacks and machine learning attacks when the attacked IPUF is of small size. A recent study showed IPUFs succumbed to a divide-and-conquer attack, and the attack method requires the position of the interpose bit known to the attacker, a condition that can be easily obfuscated by using a random interpose position. Thus, large IPUFs may still remain secure against all known modeling attacks if the interpose position is unknown to attackers. In this paper, we present a new modeling attack method of IPUFs using multilayer neural networks, and the attack method requires no knowledge of the interpose position. Our attack was tested on simulated IPUFs and silicon IPUFs implemented on FPGAs, and the results showed that many IPUFs which were resilient against existing attacks cannot withstand our new attack method, revealing a new vulnerability of IPUFs by re-defining the boundary between secure and insecure regions in the IPUF parameter space.

Williams, Phillip, Idriss, Haytham, Bayoumi, Magdy. 2021. Mc-PUF: Memory-based and Machine Learning Resilient Strong PUF for Device Authentication in Internet of Things. 2021 IEEE International Conference on Cyber Security and Resilience (CSR). :61–65.

Physically Unclonable Functions (PUFs) are hardware-based security primitives that utilize manufacturing process variations to realize binary keys (Weak PUFs) or binary functions (Strong PUFs). This primitive is desirable for key generation and authentication in constrained devices, due to its low power and low area overhead. However, in recent years many research papers are focused on the vulnerability of PUFs to modeling attacks. This attack is possible because the PUFs challenge and response exchanges are usually transmitted over communication channel without encryption. Thus, an attacker can collect challenge-response pairs and use it as input into a learning algorithm, to create a model that can predict responses given new challenges. In this paper we introduce a serial and a parallel novel 64-bits memory-based controlled PUF (Mc-PUF) architecture for device authentication that has high uniqueness and randomness, reliable, and resilient against modeling attacks. These architectures generate a response by utilizing bits extracted from the fingerprint of a synchronous random-access memory (SRAM) PUF with a control logic. The synthesis of the serial architecture yielded an area of 1.136K GE, while the parallel architecture was 3.013K GE. The best prediction accuracy obtained from the modeling attack was 50%, which prevents an attacker from accurately predicting responses to future challenges. We also showcase the scalability of the design through XOR-ing several Mc-PUFs, further improving upon its security and performance. The remainder of the paper presents the proposed architectures, along with their hardware implementations, area and power consumption, and security resilience against modeling attacks. The 3-XOR Mc-PUF had the greatest overhead, but it produced the best randomness, uniqueness, and resilience against modeling attacks.

Bardia, Vivek, Kumar, C.R.S.. 2017. Process trees & service chains can serve us to mitigate zero day attacks better. 2017 International Conference on Data Management, Analytics and Innovation (ICDMAI). :280—284.

With technology at our fingertips waiting to be exploited, the past decade saw the revolutionizing Human Computer Interactions. The ease with which a user could interact was the Unique Selling Proposition (USP) of a sales team. Human Computer Interactions have many underlying parameters like Data Visualization and Presentation as some to deal with. With the race, on for better and faster presentations, evolved many frameworks to be widely used by all software developers. As the need grew for user friendly applications, more and more software professionals were lured into the front-end sophistication domain. Application frameworks have evolved to such an extent that with just a few clicks and feeding values as per requirements we are able to produce a commercially usable application in a few minutes. These frameworks generate quantum lines of codes in minutes which leaves a contrail of bugs to be discovered in the future. We have also succumbed to the benchmarking in Software Quality Metrics and have made ourselves comfortable with buggy software's to be rectified in future. The exponential evolution in the cyber domain has also attracted attackers equally. Average human awareness and knowledge has also improved in the cyber domain due to the prolonged exposure to technology for over three decades. As the attack sophistication grows and zero day attacks become more popular than ever, the suffering end users only receive remedial measures in spite of the latest Antivirus, Intrusion Detection and Protection Systems installed. We designed a software to display the complete services and applications running in users Operating System in the easiest perceivable manner aided by Computer Graphics and Data Visualization techniques. We further designed a study by empowering the fence sitter users with tools to actively participate in protecting themselves from threats. The designed threats had impressions from the complete threat canvas in some form or other restricted to systems functioning. Network threats and any sort of packet transfer to and from the system in form of threat was kept out of the scope of this experiment. We discovered that end users had a good idea of their working environment which can be used exponentially enhances machine learning for zero day threats and segment the unmarked the vast threat landscape faster for a more reliable output.

Bardia, Vivek, Kumar, CRS. 2017. End Users Can Mitigate Zero Day Attacks Faster. 2017 IEEE 7th International Advance Computing Conference (IACC). :935—938.

The past decade has shown us the power of cyber space and we getting dependent on the same. The exponential evolution in the domain has attracted attackers and defenders of technology equally. This inevitable domain has led to the increase in average human awareness and knowledge too. As we see the attack sophistication grow the protectors have always been a step ahead mitigating the attacks. A study of the various Threat Detection, Protection and Mitigation Systems revealed to us a common similarity wherein users have been totally ignored or the systems rely heavily on the user inputs for its correct functioning. Compiling the above we designed a study wherein user inputs were taken in addition to independent Detection and Prevention systems to identify and mitigate the risks. This approach led us to a conclusion that involvement of users exponentially enhances machine learning and segments the data sets faster for a more reliable output.

2022-11-18

Khoshavi, Navid, Sargolzaei, Saman, Bi, Yu, Roohi, Arman. 2021. Entropy-Based Modeling for Estimating Adversarial Bit-flip Attack Impact on Binarized Neural Network. 2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC). :493–498.

Over past years, the high demand to efficiently process deep learning (DL) models has driven the market of the chip design companies. However, the new Deep Chip architectures, a common term to refer to DL hardware accelerator, have slightly paid attention to the security requirements in quantized neural networks (QNNs), while the black/white -box adversarial attacks can jeopardize the integrity of the inference accelerator. Therefore in this paper, a comprehensive study of the resiliency of QNN topologies to black-box attacks is examined. Herein, different attack scenarios are performed on an FPGA-processor co-design, and the collected results are extensively analyzed to give an estimation of the impact’s degree of different types of attacks on the QNN topology. To be specific, we evaluated the sensitivity of the QNN accelerator to a range number of bit-flip attacks (BFAs) that might occur in the operational lifetime of the device. The BFAs are injected at uniformly distributed times either across the entire QNN or per individual layer during the image classification. The acquired results are utilized to build the entropy-based model that can be leveraged to construct resilient QNN architectures to bit-flip attacks.

Paudel, Bijay Raj, Itani, Aashish, Tragoudas, Spyros. 2021. Resiliency of SNN on Black-Box Adversarial Attacks. 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA). :799–806.

Existing works indicate that Spiking Neural Networks (SNNs) are resilient to adversarial attacks by testing against few attack models. This paper studies adversarial attacks on SNNs using additional attack models and shows that SNNs are not inherently robust against many few-pixel L0 black-box attacks. Additionally, a method to defend against such attacks in SNNs is presented. The SNNs and the effects of adversarial attacks are tested on both software simulators as well as on SpiNNaker neuromorphic hardware.

2022-11-08

Mode, Gautam Raj, Calyam, Prasad, Hoque, Khaza Anuarul. 2020. Impact of False Data Injection Attacks on Deep Learning Enabled Predictive Analytics. NOMS 2020 - 2020 IEEE/IFIP Network Operations and Management Symposium. :1–7.

Industry 4.0 is the latest industrial revolution primarily merging automation with advanced manufacturing to reduce direct human effort and resources. Predictive maintenance (PdM) is an industry 4.0 solution, which facilitates predicting faults in a component or a system powered by state-of-the- art machine learning (ML) algorithms (especially deep learning algorithms) and the Internet-of-Things (IoT) sensors. However, IoT sensors and deep learning (DL) algorithms, both are known for their vulnerabilities to cyber-attacks. In the context of PdM systems, such attacks can have catastrophic consequences as they are hard to detect due to the nature of the attack. To date, the majority of the published literature focuses on the accuracy of DL enabled PdM systems and often ignores the effect of such attacks. In this paper, we demonstrate the effect of IoT sensor attacks (in the form of false data injection attack) on a PdM system. At first, we use three state-of-the-art DL algorithms, specifically, Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Convolutional Neural Network (CNN) for predicting the Remaining Useful Life (RUL) of a turbofan engine using NASA's C-MAPSS dataset. The obtained results show that the GRU-based PdM model outperforms some of the recent literature on RUL prediction using the C-MAPSS dataset. Afterward, we model and apply two different types of false data injection attacks (FDIA), specifically, continuous and interim FDIAs on turbofan engine sensor data and evaluate their impact on CNN, LSTM, and GRU-based PdM systems. The obtained results demonstrate that FDI attacks on even a few IoT sensors can strongly defect the RUL prediction in all cases. However, the GRU-based PdM model performs better in terms of accuracy and resiliency to FDIA. Lastly, we perform a study on the GRU-based PdM model using four different GRU networks with different sequence lengths. Our experiments reveal an interesting relationship between the accuracy, resiliency and sequence length for the GRU-based PdM models.

Wshah, Safwan, Shadid, Reem, Wu, Yuhao, Matar, Mustafa, Xu, Beilei, Wu, Wencheng, Lin, Lei, Elmoudi, Ramadan. 2020. Deep Learning for Model Parameter Calibration in Power Systems. 2020 IEEE International Conference on Power Systems Technology (POWERCON). :1–6.

In power systems, having accurate device models is crucial for grid reliability, availability, and resiliency. Existing model calibration methods based on mathematical approaches often lead to multiple solutions due to the ill-posed nature of the problem, which would require further interventions from the field engineers in order to select the optimal solution. In this paper, we present a novel deep-learning-based approach for model parameter calibration in power systems. Our study focused on the generator model as an example. We studied several deep-learning-based approaches including 1-D Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Gated Recurrent Units (GRU), which were trained to estimate model parameters using simulated Phasor Measurement Unit (PMU) data. Quantitative evaluations showed that our proposed methods can achieve high accuracy in estimating the model parameters, i.e., achieved a 0.0079 MSE on the testing dataset. We consider these promising results to be the basis for further exploration and development of advanced tools for model validation and calibration.

2022-11-02

Liu, I-Hsien, Hsieh, Cheng-En, Lin, Wei-Min, Li, Chu-Fen, Li, Jung-Shian. 2021. Malicious Flows Generator Based on Data Balanced Algorithm. 2021 International Conference on Fuzzy Theory and Its Applications (iFUZZY). :1–4.

As Internet technology gradually matures, the network structure becomes more complex. Therefore, the attack methods of malicious attackers are more diverse and change faster. Fortunately, due to the substantial increase in computer computing power, machine learning is valued and widely used in various fields. It has also been applied to intrusion detection systems. This study found that due to the imperfect data ratio of the unbalanced flow dataset, the model will be overfitting and the misjudgment rate will increase. In response to this problem, this research proposes to use the Cuckoo system to induce malicious samples to generate malicious traffic, to solve the data proportion defect of the unbalanced traffic dataset.

Myakotin, Dmitriy, Varkentin, Vitalii. 2021. Classification of Network Traffic Using Generative Adversarial Networks. 2021 International Conference on Quality Management, Transport and Information Security, Information Technologies (IT&QM&IS). :519–525.

Currently, the increasing complexity of DDoS attacks makes it difficult for modern security systems to track them. Machine learning techniques are increasingly being used in such systems as they are well established. However, a new problem arose: the creation of informative datasets. Generative adversarial networks can help create large, high-quality datasets for machine learning training. The article discusses the issue of using generative adversarial networks to generate new patterns of network attacks for the purpose of their further use in training.

2022-10-20

Thorpe, Adam J., Oishi, Meeko M. K.. 2021. Stochastic Optimal Control via Hilbert Space Embeddings of Distributions. 2021 60th IEEE Conference on Decision and Control (CDC). :904—911.

Kernel embeddings of distributions have recently gained significant attention in the machine learning community as a data-driven technique for representing probability distributions. Broadly, these techniques enable efficient computation of expectations by representing integral operators as elements in a reproducing kernel Hilbert space. We apply these techniques to the area of stochastic optimal control theory and present a method to compute approximately optimal policies for stochastic systems with arbitrary disturbances. Our approach reduces the optimization problem to a linear program, which can easily be solved via the Lagrangian dual, without resorting to gradient-based optimization algorithms. We focus on discrete- time dynamic programming, and demonstrate our proposed approach on a linear regulation problem, and on a nonlinear target tracking problem. This approach is broadly applicable to a wide variety of optimal control problems, and provides a means of working with stochastic systems in a data-driven setting.

Nassar, Reem, Elhajj, Imad, Kayssi, Ayman, Salam, Samer. 2021. Identifying NAT Devices to Detect Shadow IT: A Machine Learning Approach. 2021 IEEE/ACS 18th International Conference on Computer Systems and Applications (AICCSA). :1—7.

Network Address Translation (NAT) is an address remapping technique placed at the borders of stub domains. It is present in almost all routers and CPEs. Most NAT devices implement Port Address Translation (PAT), which allows the mapping of multiple private IP addresses to one public IP address. Based on port number information, PAT matches the incoming traffic to the corresponding "hidden" client. In an enterprise context, and with the proliferation of unauthorized wired and wireless NAT routers, NAT can be used for re-distributing an Intranet or Internet connection or for deploying hidden devices that are not visible to the enterprise IT or under its oversight, thus causing a problem known as shadow IT. Thus, it is important to detect NAT devices in an intranet to prevent this particular problem. Previous methods in identifying NAT behavior were based on features extracted from traffic traces per flow. In this paper, we propose a method to identify NAT devices using a machine learning approach from aggregated flow features. The approach uses multiple statistical features in addition to source and destination IPs and port numbers, extracted from passively collected traffic data. We also use aggregated features extracted within multiple window sizes and feed them to a machine learning classifier to study the effect of timing on NAT detection. Our approach works completely passively and achieves an accuracy of 96.9% when all features are utilized.

Castanhel, Gabriel R., Heinrich, Tiago, Ceschin, Fabrício, Maziero, Carlos. 2021. Taking a Peek: An Evaluation of Anomaly Detection Using System calls for Containers. 2021 IEEE Symposium on Computers and Communications (ISCC). :1—6.

The growth in the use of virtualization in the last ten years has contributed to the improvement of this technology. The practice of implementing and managing this type of isolated environment raises doubts about the security of such systems. Considering the host's proximity to a container, approaches that use anomaly detection systems attempt to monitor and detect unexpected behavior. Our work aims to use system calls to identify threats within a container environment, using machine learning based strategies to distinguish between expected and unexpected behaviors (possible threats).

2022-10-13

Barlow, Luke, Bendiab, Gueltoum, Shiaeles, Stavros, Savage, Nick. 2020. A Novel Approach to Detect Phishing Attacks using Binary Visualisation and Machine Learning. 2020 IEEE World Congress on Services (SERVICES). :177—182.

Protecting and preventing sensitive data from being used inappropriately has become a challenging task. Even a small mistake in securing data can be exploited by phishing attacks to release private information such as passwords or financial information to a malicious actor. Phishing has now proven so successful, it is the number one attack vector. Many approaches have been proposed to protect against this type of cyber-attack, from additional staff training, enriched spam filters to large collaborative databases of known threats such as PhishTank and OpenPhish. However, they mostly rely upon a user falling victim to an attack and manually adding this new threat to the shared pool, which presents a constant disadvantage in the fight back against phishing. In this paper, we propose a novel approach to protect against phishing attacks using binary visualisation and machine learning. Unlike previous work in this field, our approach uses an automated detection process and requires no further user interaction, which allows faster and more accurate detection process. The experiment results show that our approach has high detection rate.

Yerima, Suleiman Y., Alzaylaee, Mohammed K.. 2020. High Accuracy Phishing Detection Based on Convolutional Neural Networks. 2020 3rd International Conference on Computer Applications & Information Security (ICCAIS). :1—6.

The persistent growth in phishing and the rising volume of phishing websites has led to individuals and organizations worldwide becoming increasingly exposed to various cyber-attacks. Consequently, more effective phishing detection is required for improved cyber defence. Hence, in this paper we present a deep learning-based approach to enable high accuracy detection of phishing sites. The proposed approach utilizes convolutional neural networks (CNN) for high accuracy classification to distinguish genuine sites from phishing sites. We evaluate the models using a dataset obtained from 6,157 genuine and 4,898 phishing websites. Based on the results of extensive experiments, our CNN based models proved to be highly effective in detecting unknown phishing sites. Furthermore, the CNN based approach performed better than traditional machine learning classifiers evaluated on the same dataset, reaching 98.2% phishing detection rate with an F1-score of 0.976. The method presented in this paper compares favourably to the state-of-the art in deep learning based phishing website detection.

Li, Xue, Zhang, Dongmei, Wu, Bin. 2020. Detection method of phishing email based on persuasion principle. 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). 1:571—574.

“Phishing emails” are phishing emails with illegal links that direct users to pages of some real websites that are spoofed, or pages where real HTML has been inserted with dangerous HTML code, so as to deceive users' private information such as bank or credit card account numbers, email account numbers, and passwords. People are the most vulnerable part of security. Phishing emails use human weaknesses to attack. This article describes the application of the principle of persuasion in phishing emails, and based on the existing methods, this paper proposes a phishing email detection method based on the persuasion principle. The principle of persuasion principle is to count whether the corresponding word of the feature appears in the mail. The feature is selected using an information gain algorithm, and finally 25 features are selected for detection. Finally experimentally verified, accuracy rate reached 99.6%.

Cernica, Ionuţ, Popescu, Nirvana. 2020. Computer Vision Based Framework For Detecting Phishing Webpages. 2020 19th RoEduNet Conference: Networking in Education and Research (RoEduNet). :1—4.

One of the most dangerous threats on the internet nowadays is phishing attacks. This type of attack can lead to data breaches, and with it to image and financial loss in a company. The most common technique to exploit this type of attack is by sending emails to the target users to trick them to send their credentials to the attacker servers. If the user clicks on the link from the email, then good detection is needed to protect the user credentials. Many papers presented Computer Vision as a good detection technique, but we will explain why this solution can generate lots of false positives in some important environments. This paper focuses on challenges of the Computer Vision detection technique and proposes a combination of multiple techniques together with Computer Vision technique in order to solve the challenges we have shown. We also will present a methodology to detect phishing attacks that will work with the proposed combination techniques.

Basit, Abdul, Zafar, Maham, Javed, Abdul Rehman, Jalil, Zunera. 2020. A Novel Ensemble Machine Learning Method to Detect Phishing Attack. 2020 IEEE 23rd International Multitopic Conference (INMIC). :1—5.

Currently and particularly with remote working scenarios during COVID-19, phishing attack has become one of the most significant threats faced by internet users, organizations, and service providers. In a phishing attack, the attacker tries to steal client sensitive data (such as login, passwords, and credit card details) using spoofed emails and fake websites. Cybercriminals, hacktivists, and nation-state spy agencies have now got a fertilized ground to deploy their latest innovative phishing attacks. Timely detection of phishing attacks has become most crucial than ever. Machine learning algorithms can be used to accurately detect phishing attacks before a user is harmed. This paper presents a novel ensemble model to detect phishing attacks on the website. We select three machine learning classifiers: Artificial Neural Network (ANN), K-Nearest Neighbors (KNN), and Decision Tree (C4.5) to use in an ensemble method with Random Forest Classifier (RFC). This ensemble method effectively detects website phishing attacks with better accuracy than existing studies. Experimental results demonstrate that the ensemble of KNN and RFC detects phishing attacks with 97.33% accuracy.

2022-10-12

Ding, Xiong, Liu, Baoxu, Jiang, Zhengwei, Wang, Qiuyun, Xin, Liling. 2021. Spear Phishing Emails Detection Based on Machine Learning. 2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD). :354—359.

Spear phishing emails target to specific individual or organization, they are more elaborated, targeted, and harmful than phishing emails. The attackers usually harvest information about the recipient in any available ways, then create a carefully camouflaged email and lure the recipient to perform dangerous actions. In this paper we present a new effective approach to detect spear phishing emails based on machine learning. Firstly we extracted 21 Stylometric features from email, 3 forwarding features from Email Forwarding Relationship Graph Database(EFRGD), and 3 reputation features from two third-party threat intelligence platforms, Virus Total(VT) and Phish Tank(PT). Then we made an improvement on Synthetic Minority Oversampling Technique(SMOTE) algorithm named KM-SMOTE to reduce the impact of unbalanced data. Finally we applied 4 machine learning algorithms to distinguish spear phishing emails from non-spear phishing emails. Our dataset consists of 417 spear phishing emails and 13916 non-spear phishing emails. We were able to achieve a maximum recall of 95.56%, precision of 98.85% and 97.16% of F1-score with the help of forwarding features, reputation features and KM-SMOTE algorithm.

BOUIJIJ, Habiba, BERQIA, Amine. 2021. Machine Learning Algorithms Evaluation for Phishing URLs Classification. 2021 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT). :01—05.

Phishing URL is a type of cyberattack, based on falsified URLs. The number of phishing URL attacks continues to increase despite cybersecurity efforts. According to the Anti-Phishing Working Group (APWG), the number of phishing websites observed in 2020 is 1 520 832, doubling over the course of a year. Various algorithms, techniques and methods can be used to build models for phishing URL detection and classification. From our reading, we observed that Machine Learning (ML) is one of the recent approaches used to detect and classify phishing URL in an efficient and proactive way. In this paper, we evaluate eleven of the most adopted ML algorithms such as Decision Tree (DT), Nearest Neighbours (KNN), Gradient Boosting (GB), Logistic Regression (LR), Naïve Bayes (NB), Random Forest (RF), Support Vector Machines (SVM), Neural Network (NN), Ex-tra\_Tree (ET), Ada\_Boost (AB) and Bagging (B). To do that, we compute detection accuracy metric for each algorithm and we use lexical analysis to extract the URL features.

Faris, Humam, Yazid, Setiadi. 2021. Phishing Web Page Detection Methods: URL and HTML Features Detection. 2020 IEEE International Conference on Internet of Things and Intelligence System (IoTaIS). :167—171.

Phishing is a type of fraud on the Internet in the form of fake web pages that mimic the original web pages to trick users into sending sensitive information to phisher. The statistics presented by APWG and Phistank show that the number of phishing websites from 2015 to 2020 tends to increase continuously. To overcome this problem, several studies have been carried out including detecting phishing web pages using various features of web pages with various methods. Unfortunately, the use of several methods is not really effective because the design and evaluation are only too focused on the achievement of detection accuracy in research, but evaluation does not represent application in the real world. Whereas a security detection device should require effectiveness, good performance, and deployable. In this study the authors evaluated several methods and proposed rules-based applications that can detect phishing more efficiently.

Singh Sengar, Alok, Bhola, Abhishek, Shukla, Ratnesh Kumar, Gupta, Anurag. 2021. A Review on Phishing Websites Revealing through Machine Learning. 2021 10th International Conference on System Modeling & Advancement in Research Trends (SMART). :330—335.

Phishing is a frequent assault in which unsuspecting people’s unique, private, and sensitive information is stolen through fake websites. The primary objective of phishing websites’consistent resource allocators isto steal unique, private, and sensitive information such as user login passwords and online financial transactions. Phishers construct phony websites that look and sound just like genuine things. With the advent of technology, there are protecting users significantly increased in phishing methods. It necessitates the development of an anti-phishing technology to identify phishing and protect users. Machine learning is a useful technique for combating phishing attempts. These articles were utilized to examine Machine learning for detection strategies and characteristics.

Kumar, Yogendra, Subba, Basant. 2021. A lightweight machine learning based security framework for detecting phishing attacks. 2021 International Conference on COMmunication Systems & NETworkS (COMSNETS). :184—188.

A successful phishing attack is prelude to various other severe attacks such as login credentials theft, unauthorized access to user’s confidential data, malware and ransomware infestation of victim’s machine etc. This paper proposes a real time lightweight machine learning based security framework for detection of phishing attacks through analysis of Uniform Resource Locators (URLs). The proposed framework initially extracts a set of highly discriminating and uncorrelated features from the URL string corpus. These extracted features are then used to transform the URL strings into their corresponding numeric feature vectors, which are eventually used to train various machine learning based classifier models for identification of malicious phishing URLs. Performance analysis of the proposed security framework on two well known datasets: Kaggle dataset and UNB dataset shows that it is capable of detecting malicious phishing URLs with high precision, while at the same time maintain a very low level of false positive rate. The proposed framework is also shown to outperform other similar security frameworks proposed in the literature.121https://www.kaggle.com/antonyj453/ur1dataset2https://www.unb.ca/cic/datasets/ur1-2016.htm1