Visible to the public Biblio

Filters: Keyword is call graphs  [Clear All Filters]
2019-03-04
Hejderup, J., Deursen, A. v, Gousios, G..  2018.  Software Ecosystem Call Graph for Dependency Management. 2018 IEEE/ACM 40th International Conference on Software Engineering: New Ideas and Emerging Technologies Results (ICSE-NIER). :101–104.
A popular form of software reuse is the use of open source software libraries hosted on centralized code repositories, such as Maven or npm. Developers only need to declare dependencies to external libraries, and automated tools make them available to the workspace of the project. Recent incidents, such as the Equifax data breach and the leftpad package removal, demonstrate the difficulty in assessing the severity, impact and spread of bugs in dependency networks. While dependency checkers are being adapted as a counter measure, they only provide indicative information. To remedy this situation, we propose a fine-grained dependency network that goes beyond packages and into call graphs. The result is a versioned ecosystem-level call graph. In this paper, we outline the process to construct the proposed graph and present a preliminary evaluation of a security issue from a core package to an affected client application.
2018-06-20
Searles, R., Xu, L., Killian, W., Vanderbruggen, T., Forren, T., Howe, J., Pearson, Z., Shannon, C., Simmons, J., Cavazos, J..  2017.  Parallelization of Machine Learning Applied to Call Graphs of Binaries for Malware Detection. 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP). :69–77.

Malicious applications have become increasingly numerous. This demands adaptive, learning-based techniques for constructing malware detection engines, instead of the traditional manual-based strategies. Prior work in learning-based malware detection engines primarily focuses on dynamic trace analysis and byte-level n-grams. Our approach in this paper differs in that we use compiler intermediate representations, i.e., the callgraph representation of binaries. Using graph-based program representations for learning provides structure of the program, which can be used to learn more advanced patterns. We use the Shortest Path Graph Kernel (SPGK) to identify similarities between call graphs extracted from binaries. The output similarity matrix is fed into a Support Vector Machine (SVM) algorithm to construct highly-accurate models to predict whether a binary is malicious or not. However, SPGK is computationally expensive due to the size of the input graphs. Therefore, we evaluate different parallelization methods for CPUs and GPUs to speed up this kernel, allowing us to continuously construct up-to-date models in a timely manner. Our hybrid implementation, which leverages both CPU and GPU, yields the best performance, achieving up to a 14.2x improvement over our already optimized OpenMP version. We compared our generated graph-based models to previously state-of-the-art feature vector 2-gram and 3-gram models on a dataset consisting of over 22,000 binaries. We show that our classification accuracy using graphs is over 19% higher than either n-gram model and gives a false positive rate (FPR) of less than 0.1%. We are also able to consider large call graphs and dataset sizes because of the reduced execution time of our parallelized SPGK implementation.