Visible to the public Biblio

Found 871 results

Filters: Keyword is feature extraction  [Clear All Filters]
2023-01-05
Khodaskar, Manish, Medhane, Darshan, Ingle, Rajesh, Buchade, Amar, Khodaskar, Anuja.  2022.  Feature-based Intrusion Detection System with Support Vector Machine. 2022 IEEE International Conference on Blockchain and Distributed Systems Security (ICBDS). :1—7.
Today billions of people are accessing the internet around the world. There is a need for new technology to provide security against malicious activities that can take preventive/ defensive actions against constantly evolving attacks. A new generation of technology that keeps an eye on such activities and responds intelligently to them is the intrusion detection system employing machine learning. It is difficult for traditional techniques to analyze network generated data due to nature, amount, and speed with which the data is generated. The evolution of advanced cyber threats makes it difficult for existing IDS to perform up to the mark. In addition, managing large volumes of data is beyond the capabilities of computer hardware and software. This data is not only vast in scope, but it is also moving quickly. The system architecture suggested in this study uses SVM to train the model and feature selection based on the information gain ratio measure ranking approach to boost the overall system's efficiency and increase the attack detection rate. This work also addresses the issue of false alarms and trying to reduce them. In the proposed framework, the UNSW-NB15 dataset is used. For analysis, the UNSW-NB15 and NSL-KDD datasets are used. Along with SVM, we have also trained various models using Naive Bayes, ANN, RF, etc. We have compared the result of various models. Also, we can extend these trained models to create an ensemble approach to improve the performance of IDS.
Sravani, T., Suguna, M.Raja.  2022.  Comparative Analysis Of Crime Hotspot Detection And Prediction Using Convolutional Neural Network Over Support Vector Machine with Engineered Spatial Features Towards Increase in Classifier Accuracy. 2022 International Conference on Business Analytics for Technology and Security (ICBATS). :1—5.
The major aim of the study is to predict the type of crime that is going to happen based on the crime hotspot detected for the given crime data with engineered spatial features. crime dataset is filtered to have the following 2 crime categories: crime against society, crime against person. Crime hotspots are detected by using the Novel Hierarchical density based Spatial Clustering of Application with Noise (HDBSCAN) Algorithm with the number of clusters optimized using silhouette score. The sample data consists of 501 crime incidents. Future types of crime for the given location are predicted by using the Support Vector Machine (SVM) and Convolutional Neural Network (CNN) algorithms (N=5). The accuracy of crime prediction using Support Vector Machine classification algorithm is 94.01% and Convolutional Neural Network algorithm is 79.98% with the significance p-value of 0.033. The Support Vector Machine algorithm is significantly better in accuracy for prediction of type of crime than Convolutional Neural Network (CNN).
2022-12-23
Duby, Adam, Taylor, Teryl, Bloom, Gedare, Zhuang, Yanyan.  2022.  Detecting and Classifying Self-Deleting Windows Malware Using Prefetch Files. 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC). :0745–0751.
Malware detection and analysis can be a burdensome task for incident responders. As such, research has turned to machine learning to automate malware detection and malware family classification. Existing work extracts and engineers static and dynamic features from the malware sample to train classifiers. Despite promising results, such techniques assume that the analyst has access to the malware executable file. Self-deleting malware invalidates this assumption and requires analysts to find forensic evidence of malware execution for further analysis. In this paper, we present and evaluate an approach to detecting malware that executed on a Windows target and further classify the malware into its associated family to provide semantic insight. Specifically, we engineer features from the Windows prefetch file, a file system forensic artifact that archives process information. Results show that it is possible to detect the malicious artifact with 99% accuracy; furthermore, classifying the malware into a fine-grained family has comparable performance to techniques that require access to the original executable. We also provide a thorough security discussion of the proposed approach against adversarial diversity.
Huo, Da, Li, Xiaoyong, Li, Linghui, Gao, Yali, Li, Ximing, Yuan, Jie.  2022.  The Application of 1D-CNN in Microsoft Malware Detection. 2022 7th International Conference on Big Data Analytics (ICBDA). :181–187.
In the computer field, cybersecurity has always been the focus of attention. How to detect malware is one of the focuses and difficulties in network security research effectively. Traditional existing malware detection schemes can be mainly divided into two methods categories: database matching and the machine learning method. With the rise of deep learning, more and more deep learning methods are applied in the field of malware detection. Deeper semantic features can be extracted via deep neural network. The main tasks of this paper are as follows: (1) Using machine learning methods and one-dimensional convolutional neural networks to detect malware (2) Propose a machine The method of combining learning and deep learning is used for detection. Machine learning uses LGBM to obtain an accuracy rate of 67.16%, and one-dimensional CNN obtains an accuracy rate of 72.47%. In (2), LGBM is used to screen the importance of features and then use a one-dimensional convolutional neural network, which helps to further improve the detection result has an accuracy rate of 78.64%.
2022-12-20
Liu, Xiaolei, Li, Xiaoyu, Zheng, Desheng, Bai, Jiayu, Peng, Yu, Zhang, Shibin.  2022.  Automatic Selection Attacks Framework for Hard Label Black-Box Models. IEEE INFOCOM 2022 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). :1–7.

The current adversarial attacks against machine learning models can be divided into white-box attacks and black-box attacks. Further the black-box can be subdivided into soft label and hard label black-box, but the latter has the deficiency of only returning the class with the highest prediction probability, which leads to the difficulty in gradient estimation. However, due to its wide application, it is of great research significance and application value to explore hard label blackbox attacks. This paper proposes an Automatic Selection Attacks Framework (ASAF) for hard label black-box models, which can be explained in two aspects based on the existing attack methods. Firstly, ASAF applies model equivalence to select substitute models automatically so as to generate adversarial examples and then completes black-box attacks based on their transferability. Secondly, specified feature selection and parallel attack method are proposed to shorten the attack time and improve the attack success rate. The experimental results show that ASAF can achieve more than 90% success rate of nontargeted attack on the common models of traditional dataset ResNet-101 (CIFAR10) and InceptionV4 (ImageNet). Meanwhile, compared with FGSM and other attack algorithms, the attack time is reduced by at least 89.7% and 87.8% respectively in two traditional datasets. Besides, it can achieve 90% success rate of attack on the online model, BaiduAI digital recognition. In conclusion, ASAF is the first automatic selection attacks framework for hard label blackbox models, in which specified feature selection and parallel attack methods speed up automatic attacks.

2022-12-01
Fujita, Koji, Shibahara, Toshiki, Chiba, Daiki, Akiyama, Mitsuaki, Uchida, Masato.  2022.  Objection!: Identifying Misclassified Malicious Activities with XAI. ICC 2022 - IEEE International Conference on Communications. :2065—2070.
Many studies have been conducted to detect various malicious activities in cyberspace using classifiers built by machine learning. However, it is natural for any classifier to make mistakes, and hence, human verification is necessary. One method to address this issue is eXplainable AI (XAI), which provides a reason for the classification result. However, when the number of classification results to be verified is large, it is not realistic to check the output of the XAI for all cases. In addition, it is sometimes difficult to interpret the output of XAI. In this study, we propose a machine learning model called classification verifier that verifies the classification results by using the output of XAI as a feature and raises objections when there is doubt about the reliability of the classification results. The results of experiments on malicious website detection and malware detection show that the proposed classification verifier can efficiently identify misclassified malicious activities.
Oh, Mi-Kyung, Lee, Sangjae, Kang, Yousung.  2021.  Wi-SUN Device Authentication using Physical Layer Fingerprint. 2021 International Conference on Information and Communication Technology Convergence (ICTC). :160–162.
This paper aims to identify Wi-SUN devices using physical layer fingerprint. We first extract physical layer features based on the received Wi-SUN signals, especially focusing on device-specific clock skew and frequency deviation in FSK modulation. Then, these physical layer fingerprints are used to train a machine learning-based classifier and the resulting classifier finally identifies the authorized Wi-SUN devices. Preliminary experiments on Wi-SUN certified chips show that the authenticator with the proposed physical layer fingerprints can distinguish Wi-SUN devices with 100 % accuracy. Since no additional computational complexity for authentication is involved on the device side, our approach can be applied to any Wi-SUN based IoT devices with security requirements.
2022-11-08
Wei, Yijie, Cao, Qiankai, Gu, Jie, Otseidu, Kofi, Hargrove, Levi.  2020.  A Fully-integrated Gesture and Gait Processing SoC for Rehabilitation with ADC-less Mixed-signal Feature Extraction and Deep Neural Network for Classification and Online Training. 2020 IEEE Custom Integrated Circuits Conference (CICC). :1–4.
An ultra-low-power gesture and gait classification SoC is presented for rehabilitation application featuring (1) mixed-signal feature extraction and integrated low-noise amplifier eliminating expensive ADC and digital feature extraction, (2) an integrated distributed deep neural network (DNN) ASIC supporting a scalable multi-chip neural network for sensor fusion with distortion resiliency for low-cost front end modules, (3) onchip learning of DNN engine allowing in-situ training of user specific operations. A 12-channel 65nm CMOS test chip was fabricated with 1μW power per channel, less than 3ms computation latency, on-chip training for user-specific DNN model and multi-chip networking capability.
2022-11-02
Liu, I-Hsien, Hsieh, Cheng-En, Lin, Wei-Min, Li, Chu-Fen, Li, Jung-Shian.  2021.  Malicious Flows Generator Based on Data Balanced Algorithm. 2021 International Conference on Fuzzy Theory and Its Applications (iFUZZY). :1–4.
As Internet technology gradually matures, the network structure becomes more complex. Therefore, the attack methods of malicious attackers are more diverse and change faster. Fortunately, due to the substantial increase in computer computing power, machine learning is valued and widely used in various fields. It has also been applied to intrusion detection systems. This study found that due to the imperfect data ratio of the unbalanced flow dataset, the model will be overfitting and the misjudgment rate will increase. In response to this problem, this research proposes to use the Cuckoo system to induce malicious samples to generate malicious traffic, to solve the data proportion defect of the unbalanced traffic dataset.
Song, Xiaozhuang, Zhang, Chenhan, Yu, James J.Q..  2021.  Learn Travel Time Distribution with Graph Deep Learning and Generative Adversarial Network. 2021 IEEE International Intelligent Transportation Systems Conference (ITSC). :1385–1390.
How to obtain accurate travel time predictions is among the most critical problems in Intelligent Transportation Systems (ITS). Recent literature has shown the effectiveness of machine learning models on travel time forecasting problems. However, most of these models predict travel time in a point estimation manner, which is not suitable for real scenarios. Instead of a determined value, the travel time within a future time period is a distribution. Besides, they all use grid structure data to obtain the spatial dependency, which does not reflect the traffic network's actual topology. Hence, we propose GCGTTE to estimate the travel time in a distribution form with Graph Deep Learning and Generative Adversarial Network (GAN). We convert the data into a graph structure and use a Graph Neural Network (GNN) to build its spatial dependency. Furthermore, GCGTTE adopts GAN to approximate the real travel time distribution. We test the effectiveness of GCGTTE with other models on a real-world dataset. Thanks to the fine-grained spatial dependency modeling, GCGTTE outperforms the models that build models on a grid structure data significantly. Besides, we also compared the distribution approximation performance with DeepGTT, a Variational Inference-based model which had the state-of-the-art performance on travel time estimation. The result shows that GCGTTE outperforms DeepGTT on metrics and the distribution generated by GCGTTE is much closer to the original distribution.
Zhao, Li, Jiao, Yan, Chen, Jie, Zhao, Ruixia.  2021.  Image Style Transfer Based on Generative Adversarial Network. 2021 International Conference on Computer Network, Electronic and Automation (ICCNEA). :191–195.
Image style transfer refers to the transformation of the style of image, so that the image details are retained to the maximum extent while the style is transferred. Aiming at the problem of low clarity of style transfer images generated by CycleGAN network, this paper improves the CycleGAN network. In this paper, the network model of auto-encoder and variational auto-encoder is added to the structure. The encoding part of the auto-encoder is used to extract image content features, and the variational auto-encoder is used to extract style features. At the same time, the generating network of the model in this paper uses first to adjust the image size and then perform the convolution operation to replace the traditional deconvolution operation. The discriminating network uses a multi-scale discriminator to force the samples generated by the generating network to be more realistic and approximate the target image, so as to improve the effect of image style transfer.
2022-10-20
Varma, Dheeraj, Mishra, Shikhar, Meenpal, Ankita.  2020.  An Adaptive Image Steganographic Scheme Using Convolutional Neural Network and Dual-Tree Complex Wavelet Transform. 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT). :1—7.
The technique of concealing a confidential information in a carrier information is known as steganography. When we use digital images as carriers, it is termed as image steganography. The advancements in digital technology and the need for information security have given great significance for image steganographic methods in the area of secured communication. An efficient steganographic system is characterized by a good trade-off between its features such as imperceptibility and capacity. The proposed scheme implements an edge-detection based adaptive steganography with transform domain embedding, offering high imperceptibility and capacity. The scheme employs an adaptive embedding technique to select optimal data-hiding regions in carrier image, using Canny edge detection and a Convolutional Neural Network (CNN). Then, the secret image is embedded in the Dual-Tree Complex Wavelet Transform (DTCWT) coefficients of the selected carrier image blocks, with the help of Singular Value Decomposition (SVD). The analysis of the scheme is performed using metrics such as Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Normalized Cross Correlation (NCC).
Abdali, Natiq M., Hussain, Zahir M..  2020.  Reference-free Detection of LSB Steganography Using Histogram Analysis. 2020 30th International Telecommunication Networks and Applications Conference (ITNAC). :1—7.
Due to the difficulty of obtaining a database of original images that are required in the classification process to detect tampering, this paper presents a technique for detecting image tampering such as image steganography in the spatial domain. The system depends on deriving the auto-correlation function of the image histogram, then applying a high-pass filter with a threshold. This technique can be used to decide which image is cover or a stego image, without adopting the original image. The results have eventually revealed the validity of this system. Although this study has focused on least-significant-bit (LSB) steganography, we expect that it could be extended to other types of image tapering.
Chen, Wenhao, Lin, Li, Newman, Jennifer, Guan, Yong.  2021.  Automatic Detection of Android Steganography Apps via Symbolic Execution and Tree Matching. 2021 IEEE Conference on Communications and Network Security (CNS). :254—262.
The recent focus of cyber security on automated detection of malware for Android apps has omitted the study of some apps used for “legitimate” purposes, such as steganography apps. Mobile steganography apps can be used for delivering harmful messages, and while current research on steganalysis targets the detection of stego images using academic algorithms and well-built benchmarking image data sets, the community has overlooked uncovering a mobile app itself for its ability to perform steganographic embedding. Developing automatic tools for identifying the code in a suspect app as a stego app can be very challenging: steganography algorithms can be represented in a variety of ways, and there exists many image editing algorithms which appear similar to steganography algorithms.This paper proposes the first automated approach to detect Android steganography apps. We use symbolic execution to summarize an app’s image operation behavior into expression trees, and match the extracted expression trees with reference trees that represents the expected behavior of a steganography embedding process. We use a structural feature based similarity measure to calculate the similarity between expression trees. Our experiments show that, the propose approach can detect real world Android stego apps that implement common spatial domain and frequency domain embedding algorithms with a high degree of accuracy. Furthermore, our procedure describes a general framework that has the potential to be applied to other similar questions when studying program behaviors.
Liu, Xiyao, Fang, Yaokun, He, Feiyi, Li, Zhaoying, Zhang, Yayun, Zeng, Xiongfei.  2021.  High capacity coverless image steganography method based on geometrically robust and chaotic encrypted image moment feature. 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC). :1455—1460.
In recent years, coverless image steganography attracts significant attentions due to its distortion-free trait on carrier images to avoid the detection by steganalysis tools. Despite this advantage, current coverless methods face several challenges, e.g., vulnerability to geometrical attacks and low hidden capacity. In this paper, we propose a novel coverless steganography algorithm based on chaotic encrypted dual radial harmonic Fourier moments (DRHFM) to tackle the challenges. In specific, we build mappings between the extracted DRHFM features and secret messages. These features are robust to various of attacks, especially to geometrical attacks. We further deploy the DRHFM parameters to adjust the feature length, thus ensuring the high hidden capacity. Moreover, we introduce a chaos encryption algorithm to enhance the security of the mapping features. The experimental results demonstrate that our proposed scheme outperforms the state-of-the-art coverless steganography based on image mapping in terms of robustness and hidden capacity.
Nassar, Reem, Elhajj, Imad, Kayssi, Ayman, Salam, Samer.  2021.  Identifying NAT Devices to Detect Shadow IT: A Machine Learning Approach. 2021 IEEE/ACS 18th International Conference on Computer Systems and Applications (AICCSA). :1—7.
Network Address Translation (NAT) is an address remapping technique placed at the borders of stub domains. It is present in almost all routers and CPEs. Most NAT devices implement Port Address Translation (PAT), which allows the mapping of multiple private IP addresses to one public IP address. Based on port number information, PAT matches the incoming traffic to the corresponding "hidden" client. In an enterprise context, and with the proliferation of unauthorized wired and wireless NAT routers, NAT can be used for re-distributing an Intranet or Internet connection or for deploying hidden devices that are not visible to the enterprise IT or under its oversight, thus causing a problem known as shadow IT. Thus, it is important to detect NAT devices in an intranet to prevent this particular problem. Previous methods in identifying NAT behavior were based on features extracted from traffic traces per flow. In this paper, we propose a method to identify NAT devices using a machine learning approach from aggregated flow features. The approach uses multiple statistical features in addition to source and destination IPs and port numbers, extracted from passively collected traffic data. We also use aggregated features extracted within multiple window sizes and feed them to a machine learning classifier to study the effect of timing on NAT detection. Our approach works completely passively and achieves an accuracy of 96.9% when all features are utilized.
2022-10-16
Lee, Sungho, Lee, Hyogun, Ryu, Sukyoung.  2020.  Broadening Horizons of Multilingual Static Analysis: Semantic Summary Extraction from C Code for JNI Program Analysis. 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). :127–137.
Most programming languages support foreign language interoperation that allows developers to integrate multiple modules implemented in different languages into a single multilingual program. While utilizing various features from multiple languages expands expressivity, differences in language semantics require developers to understand the semantics of multiple languages and their inter-operation. Because current compilers do not support compile-time checking for interoperation, they do not help developers avoid in-teroperation bugs. Similarly, active research on static analysis and bug detection has been focusing on programs written in a single language. In this paper, we propose a novel approach to analyze multilingual programs statically. Unlike existing approaches that extend a static analyzer for a host language to support analysis of foreign function calls, our approach extracts semantic summaries from programs written in guest languages using a modular analysis technique, and performs a whole-program analysis with the extracted semantic summaries. To show practicality of our approach, we design and implement a static analyzer for multilingual programs, which analyzes JNI interoperation between Java and C. Our empirical evaluation shows that the analyzer is scalable in that it can construct call graphs for large programs that use JNI interoperation, and useful in that it found 74 genuine interoperation bugs in real-world Android JNI applications.
2022-10-13
Yerima, Suleiman Y., Alzaylaee, Mohammed K..  2020.  High Accuracy Phishing Detection Based on Convolutional Neural Networks. 2020 3rd International Conference on Computer Applications & Information Security (ICCAIS). :1—6.
The persistent growth in phishing and the rising volume of phishing websites has led to individuals and organizations worldwide becoming increasingly exposed to various cyber-attacks. Consequently, more effective phishing detection is required for improved cyber defence. Hence, in this paper we present a deep learning-based approach to enable high accuracy detection of phishing sites. The proposed approach utilizes convolutional neural networks (CNN) for high accuracy classification to distinguish genuine sites from phishing sites. We evaluate the models using a dataset obtained from 6,157 genuine and 4,898 phishing websites. Based on the results of extensive experiments, our CNN based models proved to be highly effective in detecting unknown phishing sites. Furthermore, the CNN based approach performed better than traditional machine learning classifiers evaluated on the same dataset, reaching 98.2% phishing detection rate with an F1-score of 0.976. The method presented in this paper compares favourably to the state-of-the art in deep learning based phishing website detection.
M, Yazhmozhi V., Janet, B., Reddy, Srinivasulu.  2020.  Anti-phishing System using LSTM and CNN. 2020 IEEE International Conference for Innovation in Technology (INOCON). :1—5.
Users prefer to do e-banking and e-shopping now-a-days because of the exponential growth of the internet. Because of this paradigm shift, hackers are finding umpteen ways to steal our personal information and critical details like details of debit and credit cards, by disguising themselves as reputed websites, just by changing the spelling or making minor modifications to the URL. Identifying whether an URL is benign or malicious is a challenging job, because it makes use of the weakness of the user. While there are several works carried out to detect phishing websites, they only use heuristic methods and list based techniques and therefore couldn't avoid phishing effectively. In this paper an anti-phishing system was proposed to protect the users. It uses an ensemble model that uses both LSTM and CNN with a massive data set containing nearly 2,00,000 URLs, that is balanced. After analyzing the accuracy of different existing approaches, it has been found that the ensemble model that uses both LSTM and CNN performed better with an accuracy of 96% and the precision is 97% respectively which is far better than the existing solutions.
Li, Xue, Zhang, Dongmei, Wu, Bin.  2020.  Detection method of phishing email based on persuasion principle. 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). 1:571—574.
“Phishing emails” are phishing emails with illegal links that direct users to pages of some real websites that are spoofed, or pages where real HTML has been inserted with dangerous HTML code, so as to deceive users' private information such as bank or credit card account numbers, email account numbers, and passwords. People are the most vulnerable part of security. Phishing emails use human weaknesses to attack. This article describes the application of the principle of persuasion in phishing emails, and based on the existing methods, this paper proposes a phishing email detection method based on the persuasion principle. The principle of persuasion principle is to count whether the corresponding word of the feature appears in the mail. The feature is selected using an information gain algorithm, and finally 25 features are selected for detection. Finally experimentally verified, accuracy rate reached 99.6%.
Basit, Abdul, Zafar, Maham, Javed, Abdul Rehman, Jalil, Zunera.  2020.  A Novel Ensemble Machine Learning Method to Detect Phishing Attack. 2020 IEEE 23rd International Multitopic Conference (INMIC). :1—5.
Currently and particularly with remote working scenarios during COVID-19, phishing attack has become one of the most significant threats faced by internet users, organizations, and service providers. In a phishing attack, the attacker tries to steal client sensitive data (such as login, passwords, and credit card details) using spoofed emails and fake websites. Cybercriminals, hacktivists, and nation-state spy agencies have now got a fertilized ground to deploy their latest innovative phishing attacks. Timely detection of phishing attacks has become most crucial than ever. Machine learning algorithms can be used to accurately detect phishing attacks before a user is harmed. This paper presents a novel ensemble model to detect phishing attacks on the website. We select three machine learning classifiers: Artificial Neural Network (ANN), K-Nearest Neighbors (KNN), and Decision Tree (C4.5) to use in an ensemble method with Random Forest Classifier (RFC). This ensemble method effectively detects website phishing attacks with better accuracy than existing studies. Experimental results demonstrate that the ensemble of KNN and RFC detects phishing attacks with 97.33% accuracy.
2022-10-12
Ding, Xiong, Liu, Baoxu, Jiang, Zhengwei, Wang, Qiuyun, Xin, Liling.  2021.  Spear Phishing Emails Detection Based on Machine Learning. 2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD). :354—359.
Spear phishing emails target to specific individual or organization, they are more elaborated, targeted, and harmful than phishing emails. The attackers usually harvest information about the recipient in any available ways, then create a carefully camouflaged email and lure the recipient to perform dangerous actions. In this paper we present a new effective approach to detect spear phishing emails based on machine learning. Firstly we extracted 21 Stylometric features from email, 3 forwarding features from Email Forwarding Relationship Graph Database(EFRGD), and 3 reputation features from two third-party threat intelligence platforms, Virus Total(VT) and Phish Tank(PT). Then we made an improvement on Synthetic Minority Oversampling Technique(SMOTE) algorithm named KM-SMOTE to reduce the impact of unbalanced data. Finally we applied 4 machine learning algorithms to distinguish spear phishing emails from non-spear phishing emails. Our dataset consists of 417 spear phishing emails and 13916 non-spear phishing emails. We were able to achieve a maximum recall of 95.56%, precision of 98.85% and 97.16% of F1-score with the help of forwarding features, reputation features and KM-SMOTE algorithm.
BOUIJIJ, Habiba, BERQIA, Amine.  2021.  Machine Learning Algorithms Evaluation for Phishing URLs Classification. 2021 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT). :01—05.
Phishing URL is a type of cyberattack, based on falsified URLs. The number of phishing URL attacks continues to increase despite cybersecurity efforts. According to the Anti-Phishing Working Group (APWG), the number of phishing websites observed in 2020 is 1 520 832, doubling over the course of a year. Various algorithms, techniques and methods can be used to build models for phishing URL detection and classification. From our reading, we observed that Machine Learning (ML) is one of the recent approaches used to detect and classify phishing URL in an efficient and proactive way. In this paper, we evaluate eleven of the most adopted ML algorithms such as Decision Tree (DT), Nearest Neighbours (KNN), Gradient Boosting (GB), Logistic Regression (LR), Naïve Bayes (NB), Random Forest (RF), Support Vector Machines (SVM), Neural Network (NN), Ex-tra\_Tree (ET), Ada\_Boost (AB) and Bagging (B). To do that, we compute detection accuracy metric for each algorithm and we use lexical analysis to extract the URL features.
Faris, Humam, Yazid, Setiadi.  2021.  Phishing Web Page Detection Methods: URL and HTML Features Detection. 2020 IEEE International Conference on Internet of Things and Intelligence System (IoTaIS). :167—171.
Phishing is a type of fraud on the Internet in the form of fake web pages that mimic the original web pages to trick users into sending sensitive information to phisher. The statistics presented by APWG and Phistank show that the number of phishing websites from 2015 to 2020 tends to increase continuously. To overcome this problem, several studies have been carried out including detecting phishing web pages using various features of web pages with various methods. Unfortunately, the use of several methods is not really effective because the design and evaluation are only too focused on the achievement of detection accuracy in research, but evaluation does not represent application in the real world. Whereas a security detection device should require effectiveness, good performance, and deployable. In this study the authors evaluated several methods and proposed rules-based applications that can detect phishing more efficiently.
Li, Chunzhi.  2021.  A Phishing Detection Method Based on Data Mining. 2021 3rd International Conference on Applied Machine Learning (ICAML). :202—205.
Data mining technology is a very important technology in the current era of data explosion. With the informationization of society and the transparency and openness of information, network security issues have become the focus of concern of people all over the world. This paper wants to compare the accuracy of multiple machine learning methods and two deep learning frameworks when using lexical features to detect and classify malicious URLs. As a result, this paper shows that the Random Forest, which is an ensemble learning method for classification, is superior to 8 other machine learning methods in this paper. Furthermore, the Random Forest is even superior to some popular deep neural network models produced by famous frameworks such as TensorFlow and PyTorch when using lexical features to detect and classify malicious URLs.