Biblio

List
Filter

Found 148 results

Filters: Keyword is malware analysis [Clear All Filters]

2020-10-26

Uchnár, Matúš, Feciľak, Peter. 2019. Behavioral malware analysis algorithm comparison. 2019 IEEE 17th World Symposium on Applied Machine Intelligence and Informatics (SAMI). :397–400.

Malware analysis and detection based on it is very important factor in the computer security. Despite of the enormous effort of companies making anti-malware solutions, it is usually not possible to respond to new malware in time and some computers will get infected. This shortcoming could be partially mitigated through using behavioral malware analysis. This work is aimed towards machine learning algorithms comparison for the behavioral malware analysis purposes.

Walker, Aaron, Sengupta, Shamik. 2019. Insights into Malware Detection via Behavioral Frequency Analysis Using Machine Learning. MILCOM 2019 - 2019 IEEE Military Communications Conference (MILCOM). :1–6.

The most common defenses against malware threats involves the use of signatures derived from instances of known malware. However, the constant evolution of the malware threat landscape necessitates defense against unknown malware, making a signature catalog of known threats insufficient to prevent zero-day vulnerabilities from being exploited. Recent research has applied machine learning approaches to identify malware through artifacts of malicious activity as observed through dynamic behavioral analysis. We have seen that these approaches mimic common malware defenses by simply offering a method of detecting known malware. We contribute a new method of identifying software as malicious or benign through analysis of the frequency of Windows API system function calls. We show that this is a powerful technique for malware detection because it generates learning models which understand the difference between malicious and benign software, rather than producing a malware signature classifier. We contribute a method of systematically comparing machine learning models against different datasets to determine their efficacy in accurately distinguishing the difference between malicious and benign software.

Chen, Cheng-Yu, Hsiao, Shun-Wen. 2019. IoT Malware Dynamic Analysis Profiling System and Family Behavior Analysis. 2019 IEEE International Conference on Big Data (Big Data). :6013–6015.

Not only the number of deployed IoT devices increases but also that of IoT malware increases. We eager to understand the threat made by IoT malware but we lack tools to observe, analyze and detect them. We design and implement an automatic, virtual machine-based profiling system to collect valuable IoT malware behavior, such as API call invocation, system call execution, etc. In addition to conventional profiling methods (e.g., strace and packet capture), the proposed profiling system adapts virtual machine introspection based API hooking technique to intercept API call invocation by malware, so that our introspection would not be detected by IoT malware. We then propose a method to convert the multiple sequential data (API calls) to a family behavior graph for further analysis.

Samantray, Om Prakash, Tripathy, Satya Narayan, Das, Susanta Kumar. 2019. A study to Understand Malware Behavior through Malware Analysis. 2019 IEEE International Conference on System, Computation, Automation and Networking (ICSCAN). :1–5.

Most of the malware detection techniques use malware signatures for detection. It is easy to detect known malicious program in a system but the problem arises when the malware is unknown. Because, unknown malware cannot be detected by using available known malware signatures. Signature based detection techniques fails to detect unknown and zero-day attacks. A novel approach is required to represent malware features effectively to detect obfuscated, unknown, and mutated malware. This paper emphasizes malware behavior, characteristics and properties extracted by different analytic techniques and to decide whether to include them to create behavioral based malware signature. We have made an attempt to understand the malware behavior using a few openly available tools for malware analysis.

2020-03-23

Bibi, Iram, Akhunzada, Adnan, Malik, Jahanzaib, Ahmed, Ghufran, Raza, Mohsin. 2019. An Effective Android Ransomware Detection Through Multi-Factor Feature Filtration and Recurrent Neural Network. 2019 UK/ China Emerging Technologies (UCET). :1–4.

With the increasing diversity of Android malware, the effectiveness of conventional defense mechanisms are at risk. This situation has endorsed a notable interest in the improvement of the exactitude and scalability of malware detection for smart devices. In this study, we have proposed an effective deep learning-based malware detection model for competent and improved ransomware detection in Android environment by looking at the algorithm of Long Short-Term Memory (LSTM). The feature selection has been done using 8 different feature selection algorithms. The 19 important features are selected through simple majority voting process by comparing results of all feature filtration techniques. The proposed algorithm is evaluated using android malware dataset (CI-CAndMal2017) and standard performance parameters. The proposed model outperforms with 97.08% detection accuracy. Based on outstanding performance, we endorse our proposed algorithm to be efficient in malware and forensic analysis.

2019-10-14

Angelini, M., Blasilli, G., Borrello, P., Coppa, E., D’Elia, D. C., Ferracci, S., Lenti, S., Santucci, G.. 2018. ROPMate: Visually Assisting the Creation of ROP-based Exploits. 2018 IEEE Symposium on Visualization for Cyber Security (VizSec). :1–8.

Exploits based on ROP (Return-Oriented Programming) are increasingly present in advanced attack scenarios. Testing systems for ROP-based attacks can be valuable for improving the security and reliability of software. In this paper, we propose ROPMATE, the first Visual Analytics system specifically designed to assist human red team ROP exploit builders. In contrast, previous ROP tools typically require users to inspect a puzzle of hundreds or thousands of lines of textual information, making it a daunting task. ROPMATE presents builders with a clear interface of well-defined and semantically meaningful gadgets, i.e., fragments of code already present in the binary application that can be chained to form fully-functional exploits. The system supports incrementally building exploits by suggesting gadget candidates filtered according to constraints on preserved registers and accessed memory. Several visual aids are offered to identify suitable gadgets and assemble them into semantically correct chains. We report on a preliminary user study that shows how ROPMATE can assist users in building ROP chains.

2019-08-05

Tofighi-Shirazi, Ramtine, Christofi, Maria, Elbaz-Vincent, Philippe, Le, Thanh-ha. 2018. DoSE: Deobfuscation Based on Semantic Equivalence. Proceedings of the 8th Software Security, Protection, and Reverse Engineering Workshop. :1:1-1:12.

Software deobfuscation is a key challenge in malware analysis to understand the internal logic of the code and establish adequate countermeasures. In order to defeat recent obfuscation techniques, state-of-the-art generic deobfuscation methodologies are based on dynamic symbolic execution (DSE). However, DSE suffers from limitations such as code coverage and scalability. In the race to counter and remove the most advanced obfuscation techniques, there is a need to reduce the amount of code to cover. To that extend, we propose a novel deobfuscation approach based on semantic equivalence, called DoSE. With DoSE, we aim to improve and complement DSE-based deobfuscation techniques by statically eliminating obfuscation transformations (built on code-reuse). This improves the code coverage. Our method's novelty comes from the transposition of existing binary diffing techniques, namely semantic equivalence checking, to the purpose of the deobfuscation of untreated techniques, such as two-way opaque constructs, that we encounter in surreptitious software. In order to challenge DoSE, we used both known malwares such as Cryptowall, WannaCry, Flame and BitCoinMiner and obfuscated code samples. Our experimental results show that DoSE is an efficient strategy of detecting obfuscation transformations based on code-reuse with low rates of false positive and/or false negative results in practice, and up to 63% of code reduction on certain types of malwares.

2019-06-24

Stokes, J. W., Wang, D., Marinescu, M., Marino, M., Bussone, B.. 2018. Attack and Defense of Dynamic Analysis-Based, Adversarial Neural Malware Detection Models. MILCOM 2018 - 2018 IEEE Military Communications Conference (MILCOM). :1–8.

Recently researchers have proposed using deep learning-based systems for malware detection. Unfortunately, all deep learning classification systems are vulnerable to adversarial learning-based attacks, or adversarial attacks, where miscreants can avoid detection by the classification algorithm with very few perturbations of the input data. Previous work has studied adversarial attacks against static analysis-based malware classifiers which only classify the content of the unknown file without execution. However, since the majority of malware is either packed or encrypted, malware classification based on static analysis often fails to detect these types of files. To overcome this limitation, anti-malware companies typically perform dynamic analysis by emulating each file in the anti-malware engine or performing in-depth scanning in a virtual machine. These strategies allow the analysis of the malware after unpacking or decryption. In this work, we study different strategies of crafting adversarial samples for dynamic analysis. These strategies operate on sparse, binary inputs in contrast to continuous inputs such as pixels in images. We then study the effects of two, previously proposed defensive mechanisms against crafted adversarial samples including the distillation and ensemble defenses. We also propose and evaluate the weight decay defense. Experiments show that with these three defenses, the number of successfully crafted adversarial samples is reduced compared to an unprotected baseline system. In particular, the ensemble defense is the most resilient to adversarial attacks. Importantly, none of the defenses significantly reduce the classification accuracy for detecting malware. Finally, we show that while adding additional hidden layers to neural models does not significantly improve the malware classification accuracy, it does significantly increase the classifier's robustness to adversarial attacks.

Lai, Chia-Min, Lu, Chia-Yu, Lee, Hahn-Ming. 2018. Implementation of Adversarial Scenario to Malware Analytic. Proceedings of the 2Nd International Conference on Machine Learning and Soft Computing. :127–132.

As the worldwide internet has non-stop developments, it comes with enormous amount automatically generated malware. Those malware had become huge threaten to computer users. A comprehensive malware family classifier can help security researchers to quickly identify characteristics of malware which help malware analysts to investigate in more efficient way. However, despite the assistance of the artificial intelligent (AI) classifiers, it has been shown that the AI-based classifiers are vulnerable to so-called adversarial attacks. In this paper, we demonstrate how the adversarial settings can be applied to the classifier of malware families classification. Our experimental results achieved high successful rate through the adversarial attack. We also find the important features which are ignored by malware analysts but useful in the future analysis.

Copty, Fady, Danos, Matan, Edelstein, Orit, Eisner, Cindy, Murik, Dov, Zeltser, Benjamin. 2018. Accurate Malware Detection by Extreme Abstraction. Proceedings of the 34th Annual Computer Security Applications Conference. :101–111.

Modern malware applies a rich arsenal of evasion techniques to render dynamic analysis ineffective. In turn, dynamic analysis tools take great pains to hide themselves from malware; typically this entails trying to be as faithful as possible to the behavior of a real run. We present a novel approach to malware analysis that turns this idea on its head, using an extreme abstraction of the operating system that intentionally strays from real behavior. The key insight is that the presence of malicious behavior is sufficient evidence of malicious intent, even if the path taken is not one that could occur during a real run of the sample. By exploring multiple paths in a system that only approximates the behavior of a real system, we can discover behavior that would often be hard to elicit otherwise. We aggregate features from multiple paths and use a funnel-like configuration of machine learning classifiers to achieve high accuracy without incurring too much of a performance penalty. We describe our system, TAMALES (The Abstract Malware Analysis LEarning System), in detail and present machine learning results using a 330K sample set showing an FPR (False Positive Rate) of 0.10% with a TPR (True Positive Rate) of 99.11%, demonstrating that extreme abstraction can be extraordinarily effective in providing data that allows a classifier to accurately detect malware.

Ijaz, M., Durad, M. H., Ismail, M.. 2019. Static and Dynamic Malware Analysis Using Machine Learning. 2019 16th International Bhurban Conference on Applied Sciences and Technology (IBCAST). :687–691.

Malware detection is an indispensable factor in security of internet oriented machines. The combinations of different features are used for dynamic malware analysis. The different combinations are generated from APIs, Summary Information, DLLs and Registry Keys Changed. Cuckoo sandbox is used for dynamic malware analysis, which is customizable, and provide good accuracy. More than 2300 features are extracted from dynamic analysis of malware and 92 features are extracted statically from binary malware using PEFILE. Static features are extracted from 39000 malicious binaries and 10000 benign files. Dynamically 800 benign files and 2200 malware files are analyzed in Cuckoo Sandbox and 2300 features are extracted. The accuracy of dynamic malware analysis is 94.64% while static analysis accuracy is 99.36%. The dynamic malware analysis is not effective due to tricky and intelligent behaviours of malwares. The dynamic analysis has some limitations due to controlled network behavior and it cannot be analyzed completely due to limited access of network.

Wright, D., Stroschein, J.. 2018. A Malware Analysis and Artifact Capture Tool. 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech). :328–333.

Malware authors attempt to obfuscate and hide their code in its static and dynamic states. This paper provides a novel approach to aid analysis by intercepting and capturing malware artifacts and providing dynamic control of process flow. Capturing malware artifacts allows an analyst to more quickly and comprehensively understand malware behavior and obfuscation techniques and doing so interactively allows multiple code paths to be explored. The faster that malware can be analyzed the quicker the systems and data compromised by it can be determined and its infection stopped. This research proposes an instantiation of an interactive malware analysis and artifact capture tool.

Naeem, H., Guo, B., Naeem, M. R.. 2018. A light-weight malware static visual analysis for IoT infrastructure. 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD). :240–244.

Recently a huge trend on the internet of things (IoT) and an exponential increase in automated tools are helping malware producers to target IoT devices. The traditional security solutions against malware are infeasible due to low computing power for large-scale data in IoT environment. The number of malware and their variants are increasing due to continuous malware attacks. Consequently, the performance improvement in malware analysis is critical requirement to stop rapid expansion of malicious attacks in IoT environment. To solve this problem, the paper proposed a novel framework for classifying malware in IoT environment. To achieve flne-grained malware classification in suggested framework, the malware image classification system (MICS) is designed for representing malware image globally and locally. MICS first converts the suspicious program into the gray-scale image and then captures hybrid local and global malware features to perform malware family classification. Preliminary experimental outcomes of MICS are quite promising with 97.4% classification accuracy on 9342 windows suspicious programs of 25 families. The experimental results indicate that proposed framework is quite capable to process large-scale IoT malware.

Qbeitah, M. A., Aldwairi, M.. 2018. Dynamic malware analysis of phishing emails. 2018 9th International Conference on Information and Communication Systems (ICICS). :18–24.

Malicious software or malware is one of the most significant dangers facing the Internet today. In the fight against malware, users depend on anti-malware and anti-virus products to proactively detect threats before damage is done. Those products rely on static signatures obtained through malware analysis. Unfortunately, malware authors are always one step ahead in avoiding detection. This research deals with dynamic malware analysis, which emphasizes on: how the malware will behave after execution, what changes to the operating system, registry and network communication take place. Dynamic analysis opens up the doors for automatic generation of anomaly and active signatures based on the new malware's behavior. The research includes a design of honeypot to capture new malware and a complete dynamic analysis laboratory setting. We propose a standard analysis methodology by preparing the analysis tools, then running the malicious samples in a controlled environment to investigate their behavior. We analyze 173 recent Phishing emails and 45 SPIM messages in search for potentially new malwares, we present two malware samples and their comprehensive dynamic analysis.

Sethi, Kamalakanta, Chaudhary, Shankar Kumar, Tripathy, Bata Krishan, Bera, Padmalochan. 2018. A Novel Malware Analysis Framework for Malware Detection and Classification Using Machine Learning Approach. Proceedings of the 19th International Conference on Distributed Computing and Networking. :49:1–49:4.

Nowadays, the digitization of the world is under a serious threat due to the emergence of various new and complex malware every day. Due to this, the traditional signature-based methods for detection of malware effectively become an obsolete method. The efficiency of the machine learning techniques in context to the detection of malwares has been proved by state-of-the-art research works. In this paper, we have proposed a framework to detect and classify different files (e.g., exe, pdf, php, etc.) as benign and malicious using two level classifier namely, Macro (for detection of malware) and Micro (for classification of malware files as a Trojan, Spyware, Ad-ware, etc.). Our solution uses Cuckoo Sandbox for generating static and dynamic analysis report by executing the sample files in the virtual environment. In addition, a novel feature extraction module has been developed which functions based on static, behavioral and network analysis using the reports generated by the Cuckoo Sandbox. Weka Framework is used to develop machine learning models by using training datasets. The experimental results using the proposed framework shows high detection rate and high classification rate using different machine learning algorithms

Viglianisi, Gabriele, Carminati, Michele, Polino, Mario, Continella, Andrea, Zanero, Stefano. 2018. SysTaint: Assisting Reversing of Malicious Network Communications. Proceedings of the 8th Software Security, Protection, and Reverse Engineering Workshop. :4:1–4:12.

The ever-increasing number of malware samples demands for automated tools that aid the analysts in the reverse engineering of complex malicious binaries. Frequently, malware communicates over an encrypted channel with external network resources under the control of malicious actors, such as Command and Control servers that control the botnet of infected machines. Hence, a key aspect in malware analysis is uncovering and understanding the semantics of network communications. In this paper we present SysTaint, a semi-automated tool that runs malware samples in a controlled environment and analyzes their execution to support the analyst in identifying the functions involved in the communication and the exchanged data. Our evaluation on four banking Trojan samples from different families shows that SysTaint is able to handle and inspect encrypted network communications, obtaining useful information on the data being sent and received, on how each sample processes this data, and on the inner portions of code that deal with the data processing.

Yakura, Hiromu, Shinozaki, Shinnosuke, Nishimura, Reon, Oyama, Yoshihiro, Sakuma, Jun. 2018. Malware Analysis of Imaged Binary Samples by Convolutional Neural Network with Attention Mechanism. Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy. :127–134.

This paper presents a proposal of a method to extract important byte sequences in malware samples to reduce the workload of human analysts who investigate the functionalities of the samples. This method, by applying convolutional neural network (CNN) with a technique called attention mechanism to an image converted from binary data, enables calculation of an "attention map," which shows regions having higher importance for classification in the image. This distinction of regions enables extraction of characteristic byte sequences peculiar to the malware family from the binary data and can provide useful information for the human analysts without a priori knowledge. Furthermore, the proposed method calculates the attention map for all binary data including the data section. Thus, it can process packed malware that might contain obfuscated code in the data section. Results of our evaluation experiment using malware datasets show that the proposed method provides higher classification accuracy than conventional methods. Furthermore, analysis of malware samples based on the calculated attention maps confirmed that the extracted sequences provide useful information for manual analysis, even when samples are packed.

2019-06-10

Kargaard, J., Drange, T., Kor, A., Twafik, H., Butterfield, E.. 2018. Defending IT Systems against Intelligent Malware. 2018 IEEE 9th International Conference on Dependable Systems, Services and Technologies (DESSERT). :411-417.

The increasing amount of malware variants seen in the wild is causing problems for Antivirus Software vendors, unable to keep up by creating signatures for each. The methods used to develop a signature, static and dynamic analysis, have various limitations. Machine learning has been used by Antivirus vendors to detect malware based on the information gathered from the analysis process. However, adversarial examples can cause machine learning algorithms to miss-classify new data. In this paper we describe a method for malware analysis by converting malware binaries to images and then preparing those images for training within a Generative Adversarial Network. These unsupervised deep neural networks are not susceptible to adversarial examples. The conversion to images from malware binaries should be faster than using dynamic analysis and it would still be possible to link malware families together. Using the Generative Adversarial Network, malware detection could be much more effective and reliable.

Debatty, T., Mees, W., Gilon, T.. 2018. Graph-Based APT Detection. 2018 International Conference on Military Communications and Information Systems (ICMCIS). :1-8.

In this paper we propose a new algorithm to detect Advanced Persistent Threats (APT's) that relies on a graph model of HTTP traffic. We also implement a complete detection system with a web interface that allows to interactively analyze the data. We perform a complete parameter study and experimental evaluation using data collected on a real network. The results show that the performance of our system is comparable to currently available antiviruses, although antiviruses use signatures to detect known malwares while our algorithm solely uses behavior analysis to detect new undocumented attacks.

Xue, S., Zhang, L., Li, A., Li, X., Ruan, C., Huang, W.. 2018. AppDNA: App Behavior Profiling via Graph-Based Deep Learning. IEEE INFOCOM 2018 - IEEE Conference on Computer Communications. :1475-1483.

Better understanding of mobile applications' behaviors would lead to better malware detection/classification and better app recommendation for users. In this work, we design a framework AppDNA to automatically generate a compact representation for each app to comprehensively profile its behaviors. The behavior difference between two apps can be measured by the distance between their representations. As a result, the versatile representation can be generated once for each app, and then be used for a wide variety of objectives, including malware detection, app categorizing, plagiarism detection, etc. Based on a systematic and deep understanding of an app's behavior, we propose to perform a function-call-graph-based app profiling. We carefully design a graph-encoding method to convert a typically extremely large call-graph to a 64-dimension fix-size vector to achieve robust app profiling. Our extensive evaluations based on 86,332 benign and malicious apps demonstrate that our system performs app profiling (thus malware detection, classification, and app recommendation) to a high accuracy with extremely low computation cost: it classifies 4024 (benign/malware) apps using around 5.06 second with accuracy about 93.07%; it classifies 570 malware's family (total 21 families) using around 0.83 second with accuracy 82.3%; it classifies 9,730 apps' functionality with accuracy 33.3% for a total of 7 categories and accuracy of 88.1 % for 2 categories.

Jiang, H., Turki, T., Wang, J. T. L.. 2018. DLGraph: Malware Detection Using Deep Learning and Graph Embedding. 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). :1029-1033.

In this paper we present a new approach, named DLGraph, for malware detection using deep learning and graph embedding. DLGraph employs two stacked denoising autoencoders (SDAs) for representation learning, taking into consideration computer programs' function-call graphs and Windows application programming interface (API) calls. Given a program, we first use a graph embedding technique that maps the program's function-call graph to a vector in a low-dimensional feature space. One SDA in our deep learning model is used to learn a latent representation of the embedded vector of the function-call graph. The other SDA in our model is used to learn a latent representation of the given program's Windows API calls. The two learned latent representations are then merged to form a combined feature vector. Finally, we use softmax regression to classify the combined feature vector for predicting whether the given program is malware or not. Experimental results based on different datasets demonstrate the effectiveness of the proposed approach and its superiority over a related method.

Jain, D., Khemani, S., Prasad, G.. 2018. Identification of Distributed Malware. 2018 IEEE 3rd International Conference on Communication and Information Systems (ICCIS). :242-246.

Smartphones have evolved over the years from simple devices to communicate with each other to fully functional portable computers although with comparatively less computational power but inholding multiple applications within. With the smartphone revolution, the value of personal data has increased. As technological complexities increase, so do the vulnerabilities in the system. Smartphones are the latest target for attacks. Android being an open source platform and also the most widely used smartphone OS draws the attention of many malware writers to exploit the vulnerabilities of it. Attackers try to take advantage of these vulnerabilities and fool the user and misuse their data. Malwares have come a long way from simple worms to sophisticated DDOS using Botnets, the latest trends in computer malware tend to go in the distributed direction, to evade the multiple anti-virus apps developed to counter generic viruses and Trojans. However, the recent trend in android system is to have a combination of applications which acts as malware. The applications are benign individually but when grouped, these may result into a malicious activity. This paper proposes a new category of distributed malware in android system, how it can be used to evade the current security, and how it can be detected with the help of graph matching algorithm.

Luo, Chen, Chen, Zhengzhang, Tang, Lu-An, Shrivastava, Anshumali, Li, Zhichun, Chen, Haifeng, Ye, Jieping. 2018. TINET: Learning Invariant Networks via Knowledge Transfer. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. :1890-1899.

The latent behavior of an information system that can exhibit extreme events, such as system faults or cyber-attacks, is complex. Recently, the invariant network has shown to be a powerful way of characterizing complex system behaviors. Structures and evolutions of the invariance network, in particular, the vanishing correlations, can shed light on identifying causal anomalies and performing system diagnosis. However, due to the dynamic and complex nature of real-world information systems, learning a reliable invariant network in a new environment often requires continuous collecting and analyzing the system surveillance data for several weeks or even months. Although the invariant networks learned from old environments have some common entities and entity relationships, these networks cannot be directly borrowed for the new environment due to the domain variety problem. To avoid the prohibitive time and resource consuming network building process, we propose TINET, a knowledge transfer based model for accelerating invariant network construction. In particular, we first propose an entity estimation model to estimate the probability of each source domain entity that can be included in the final invariant network of the target domain. Then, we propose a dependency construction model for constructing the unbiased dependency relationships by solving a two-constraint optimization problem. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness and efficiency of TINET. We also apply TINET to a real enterprise security system for intrusion detection. TINET achieves superior detection performance at least 20 days lead-lag time in advance with more than 75% accuracy.

Cao, Cheng, Chen, Zhengzhang, Caverlee, James, Tang, Lu-An, Luo, Chen, Li, Zhichun. 2018. Behavior-Based Community Detection: Application to Host Assessment In Enterprise Information Networks. Proceedings of the 27th ACM International Conference on Information and Knowledge Management. :1977-1985.

Community detection in complex networks is a fundamental problem that attracts much attention across various disciplines. Previous studies have been mostly focusing on external connections between nodes (i.e., topology structure) in the network whereas largely ignoring internal intricacies (i.e., local behavior) of each node. A pair of nodes without any interaction can still share similar internal behaviors. For example, in an enterprise information network, compromised computers controlled by the same intruder often demonstrate similar abnormal behaviors even if they do not connect with each other. In this paper, we study the problem of community detection in enterprise information networks, where large-scale internal events and external events coexist on each host. The discovered host communities, capturing behavioral affinity, can benefit many comparative analysis tasks such as host anomaly assessment. In particular, we propose a novel community detection framework to identify behavior-based host communities in enterprise information networks, purely based on large-scale heterogeneous event data. We continue proposing an efficient method for assessing host's anomaly level by leveraging the detected host communities. Experimental results on enterprise networks demonstrate the effectiveness of our model.

Su, Fang-Hsiang, Bell, Jonathan, Kaiser, Gail, Ray, Baishakhi. 2018. Obfuscation Resilient Search Through Executable Classification. Proceedings of the 2Nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages. :20-30.

Android applications are usually obfuscated before release, making it difficult to analyze them for malware presence or intellectual property violations. Obfuscators might hide the true intent of code by renaming variables and/or modifying program structures. It is challenging to search for executables relevant to an obfuscated application for developers to analyze efficiently. Prior approaches toward obfuscation resilient search have relied on certain structural parts of apps remaining as landmarks, un-touched by obfuscation. For instance, some prior approaches have assumed that the structural relationships between identifiers are not broken by obfuscators; others have assumed that control flow graphs maintain their structures. Both approaches can be easily defeated by a motivated obfuscator. We present a new approach, MACNETO, to search for programs relevant to obfuscated executables leveraging deep learning and principal components on instructions. MACNETO makes few assumptions about the kinds of modifications that an obfuscator might perform. We show that it has high search precision for executables obfuscated by a state-of-the-art obfuscator that changes control flow. Further, we also demonstrate the potential of MACNETO to help developers understand executables, where MACNETO infers keywords (which are from relevant un-obfuscated programs) for obfuscated executables.