Biblio

Filters: Author is Chan, Philip K.  [Clear All Filters]
2023-09-18
Jia, Jingyun, Chan, Philip K..  2022.  Representation Learning with Function Call Graph Transformations for Malware Open Set Recognition. 2022 International Joint Conference on Neural Networks (IJCNN). :1—8.
Open set recognition (OSR) problem has been a challenge in many machine learning (ML) applications, such as security. As new/unknown malware families occur regularly, it is difficult to exhaust samples that cover all the classes for the training process in ML systems. An advanced malware classification system should classify the known classes correctly while sensitive to the unknown class. In this paper, we introduce a self-supervised pre-training approach for the OSR problem in malware classification. We propose two transformations for the function call graph (FCG) based malware representations to facilitate the pretext task. Also, we present a statistical thresholding approach to find the optimal threshold for the unknown class. Moreover, the experiment results indicate that our proposed pre-training process can improve different performances of different downstream loss functions for the OSR problem.
2018-06-20
Hassen, Mehadi, Chan, Philip K..  2017.  Scalable Function Call Graph-based Malware Classification. Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy. :239–248.
In an attempt to preserve the structural information in malware binaries during feature extraction, function call graph-based features have been used in various research works in malware classification. However, the approach usually employed when performing classification on these graphs, is based on computing graph similarity using computationally intensive techniques. Due to this, much of the previous work in this area incurred large performance overhead and does not scale well. In this paper, we propose a linear time function call graph (FCG) vector representation based on function clustering that has significant performance gains in addition to improved classification accuracy. We also show how this representation can enable using graph features together with other non-graph features.