Visible to the public Biblio

Filters: Keyword is clustering analysis  [Clear All Filters]
2022-02-07
Zhang, Ruichao, Wang, Shang, Burton, Renee, Hoang, Minh, Hu, Juhua, Nascimento, Anderson C A.  2021.  Clustering Analysis of Email Malware Campaigns. 2021 IEEE International Conference on Cyber Security and Resilience (CSR). :95–102.
The task of malware labeling on real datasets faces huge challenges—ever-changing datasets and lack of ground-truth labels—owing to the rapid growth of malware. Clustering malware on their respective families is a well known tool used for improving the efficiency of the malware labeling process. In this paper, we addressed the challenge of clustering email malware, and carried out a cluster analysis on a real dataset collected from email campaigns over a 13-month period. Our main original contribution is to analyze the usefulness of email’s header information for malware clustering (a novel approach proposed by Burton [1]), and compare it with features collected from the malware directly. We compare clustering based on email header’s information with traditional features extracted from varied resources provided by VirusTotal [2], including static and dynamic analysis. We show that email header information has an excellent performance.
2021-06-30
Xu, Hui, Zhang, Wei, Gao, Man, Chen, Hongwei.  2020.  Clustering Analysis for Big Data in Network Security Domain Using a Spark-Based Method. 2020 IEEE 5th International Symposium on Smart and Wireless Systems within the Conferences on Intelligent Data Acquisition and Advanced Computing Systems (IDAACS-SWS). :1—4.
Considering the problem of network security under the background of big data, the clustering analysis algorithms can be utilized to improve the correctness of network intrusion detection models for security management. As a kind of iterative clustering analysis algorithm, K-means algorithm is not only simple but also efficient, so it is widely used. However, the traditional K-means algorithm cannot well solve the network security problem when facing big data due to its high complexity and limited processing ability. In this case, this paper proposes to optimize the traditional K-means algorithm based on the Spark platform and deploy the optimized clustering analysis algorithm in the distributed architecture, so as to improve the efficiency of clustering algorithm for network intrusion detection in big data environment. The experimental result shows that, compared with the traditional K-means algorithm, the efficiency of the optimized K-means algorithm using a Spark-based method is significantly improved in the running time.
2020-07-06
Ben, Yongming, Han, Yanni, Cai, Ning, An, Wei, Xu, Zhen.  2019.  An Online System Dependency Graph Anomaly Detection based on Extended Weisfeiler-Lehman Kernel. MILCOM 2019 - 2019 IEEE Military Communications Conference (MILCOM). :1–6.
Modern operating systems are typical multitasking systems: Running multiple tasks at the same time. Therefore, a large number of system calls belonging to different processes are invoked at the same time. By associating these invocations, one can construct the system dependency graph. In rapidly evolving system dependency graphs, how to quickly find outliers is an urgent issue for intrusion detection. Clustering analysis based on graph similarity will help solve this problem. In this paper, an extended Weisfeiler-Lehman(WL) kernel is proposed. Firstly, an embedded vector with indefinite dimensions is constructed based on the original dependency graph. Then, the vector is compressed with Simhash to generate a fingerprint. Finally, anomaly detection based on clustering is carried out according to these fingerprints. Our scheme can achieve prominent detection with high efficiency. For validation, we choose StreamSpot, a relevant prior work, to act as benchmark, and use the same data set as it to carry out evaluations. Experiments show that our scheme can achieve the highest detection precision of 98% while maintaining a perfect recall performance. Moreover, both quantitative and visual comparisons demonstrate the outperforming clustering effect of our scheme than StreamSpot.
2020-01-27
Fuchs, Caro, Spolaor, Simone, Nobile, Marco S., Kaymak, Uzay.  2019.  A Swarm Intelligence Approach to Avoid Local Optima in Fuzzy C-Means Clustering. 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). :1–6.
Clustering analysis is an important computational task that has applications in many domains. One of the most popular algorithms to solve the clustering problem is fuzzy c-means, which exploits notions from fuzzy logic to provide a smooth partitioning of the data into classes, allowing the possibility of multiple membership for each data sample. The fuzzy c-means algorithm is based on the optimization of a partitioning function, which minimizes inter-cluster similarity. This optimization problem is known to be NP-hard and it is generally tackled using a hill climbing method, a local optimizer that provides acceptable but sub-optimal solutions, since it is sensitive to initialization and tends to get stuck in local optima. In this work we propose an alternative approach based on the swarm intelligence global optimization method Fuzzy Self-Tuning Particle Swarm Optimization (FST-PSO). We solve the fuzzy clustering task by optimizing fuzzy c-means' partitioning function using FST-PSO. We show that this population-based metaheuristics is more effective than hill climbing, providing high quality solutions with the cost of an additional computational complexity. It is noteworthy that, since this particle swarm optimization algorithm is self-tuning, the user does not have to specify additional hyperparameters for the optimization process.