Visible to the public Biblio

Filters: Author is Liu, Baoxu  [Clear All Filters]
2022-10-12
Ding, Xiong, Liu, Baoxu, Jiang, Zhengwei, Wang, Qiuyun, Xin, Liling.  2021.  Spear Phishing Emails Detection Based on Machine Learning. 2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD). :354—359.
Spear phishing emails target to specific individual or organization, they are more elaborated, targeted, and harmful than phishing emails. The attackers usually harvest information about the recipient in any available ways, then create a carefully camouflaged email and lure the recipient to perform dangerous actions. In this paper we present a new effective approach to detect spear phishing emails based on machine learning. Firstly we extracted 21 Stylometric features from email, 3 forwarding features from Email Forwarding Relationship Graph Database(EFRGD), and 3 reputation features from two third-party threat intelligence platforms, Virus Total(VT) and Phish Tank(PT). Then we made an improvement on Synthetic Minority Oversampling Technique(SMOTE) algorithm named KM-SMOTE to reduce the impact of unbalanced data. Finally we applied 4 machine learning algorithms to distinguish spear phishing emails from non-spear phishing emails. Our dataset consists of 417 spear phishing emails and 13916 non-spear phishing emails. We were able to achieve a maximum recall of 95.56%, precision of 98.85% and 97.16% of F1-score with the help of forwarding features, reputation features and KM-SMOTE algorithm.
2020-08-17
Yao, Yepeng, Su, Liya, Lu, Zhigang, Liu, Baoxu.  2019.  STDeepGraph: Spatial-Temporal Deep Learning on Communication Graphs for Long-Term Network Attack Detection. 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :120–127.
Network communication data are high-dimensional and spatiotemporal, and their information content is often degraded by common traffic analysis methods. For long-term network attack detection based on network flows, it is important to extract a discriminative, high-dimensional intrinsic representation of such flows. This work focuses on a hybrid deep neural network design using a combination of a convolutional neural network (CNN) and long short-term memory (LSTM) with graph similarity measures to learn high-dimensional representations from the network traffic. In particular, examining a set of network flows, we commence by constructing a temporal communication graph and then computing graph kernel matrices. Having obtained the kernel matrices, for each graph, we use the kernel value between graphs and calculate graph characterization vectors by graph signal processing. This vector can be regarded as a kernel-based similarity embedding vector of the graph that integrates structural similarity information and leverages efficient graph kernel using the graph Laplacian matrix. Our approach exploits graph structures as the additional prior information, the graph Laplacian matrix for feature extraction and hybrid deep learning models for long-term information learning on communication graphs. Experiments on two real-world network attack datasets show that our approach can extract more discriminative representations, leading to an improved accuracy in a supervised classification task. The experimental results show that our method increases the overall accuracy by approximately 10%-15%.
2020-05-04
Su, Liya, Yao, Yepeng, Lu, Zhigang, Liu, Baoxu.  2019.  Understanding the Influence of Graph Kernels on Deep Learning Architecture: A Case Study of Flow-Based Network Attack Detection. 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :312–318.
Flow-based network attack detection technology is able to identify many threats in network traffic. Existing techniques have several drawbacks: i) rule-based approaches are vulnerable because it needs all the signatures defined for the possible attacks, ii) anomaly-based approaches are not efficient because it is easy to find ways to launch attacks that bypass detection, and iii) both rule-based and anomaly-based approaches heavily rely on domain knowledge of networked system and cyber security. The major challenge to existing methods is to understand novel attack scenarios and design a model to detect novel and more serious attacks. In this paper, we investigate network attacks and unveil the key activities and the relationships between these activities. For that reason, we propose methods to understand the network security practices using theoretic concepts such as graph kernels. In addition, we integrate graph kernels over deep learning architecture to exploit the relationship expressiveness among network flows and combine ability of deep neural networks (DNNs) with deep architectures to learn hidden representations, based on the communication representation graph of each network flow in a specific time interval, then the flow-based network attack detection can be done effectively by measuring the similarity between the graphs to two flows. The proposed study provides the effectiveness to obtain insights about network attacks and detect network attacks. Using two real-world datasets which contain several new types of network attacks, we achieve significant improvements in accuracies over existing network attack detection tasks.