Malware Analysis, Part 3


	Malware Analysis, Part 3

Malware detection, analysis, and classification are perennial issues in cybersecurity. The research presented here advances malware analysis in some unique and interesting ways. The works cited were published or presented in 2014. Because of the volume of work, the bibliography is broken into multiple parts.

Maier, Dominik; Müller, Tilo; Protsenko, Mykola, "Divide-and-Conquer: Why Android Malware Cannot Be Stopped," Availability, Reliability and Security (ARES), 2014 Ninth International Conference on, pp.30,39, 8-12 Sept. 2014. doi: 10.1109/ARES.2014.12 Abstract: In this paper, we demonstrate that Android malware can bypass all automated analysis systems, including AV solutions, mobile sandboxes, and the Google Bouncer. We propose a tool called Sand-Finger for the fingerprinting of Android-based analysis systems. By analyzing the fingerprints of ten unique analysis environments from different vendors, we were able to find characteristics in which all tested environments differ from actual hardware. Depending on the availability of an analysis system, malware can either behave benignly or load malicious code at runtime. We classify this group of malware as Divide-and-Conquer attacks that are efficiently obfuscated by a combination of fingerprinting and dynamic code loading. In this group, we aggregate attacks that work against dynamic as well as static analysis. To demonstrate our approach, we create proof-of-concept malware that surpasses up-to-date malware scanners for Android. We also prove that known malware samples can enter the Google Play Store by modifying them only slightly. Due to Android's lack of an API for malware scanning at runtime, it is impossible for AV solutions to secure Android devices against these attacks.
Keywords: Androids; Google; Hardware; Humanoid robots; Malware; Mobile communication; Smart phones; AV; Android Malware; Google Bouncer; Mobile Sandboxes; Obfuscation; Static and Dynamic Analysis (ID#: 15-4926)
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6980261&isnumber=6980232

Xiaoguang Han; Jigang Sun; Wu Qu; Xuanxia Yao, "Distributed Malware Detection Based on Binary File Features in Cloud Computing Environment," Control and Decision Conference (2014 CCDC), The 26th Chinese, pp.4083,4088, May 31 2014-June 2 2014. doi: 10.1109/CCDC.2014.6852896 Abstract: A number of techniques have been devised by researchers to counter malware attacks, and machine learning techniques play an important role in automated malware detection. Several machine learning approaches have been applied to malware detection, based on different features derived from dynamic analysis of the malware. While these methods demonstrate promise, they pose at least two major challenges. First, these approaches are subjected to a growing array of countermeasures that increase the cost of capturing these malware binary executable file features. Further, feature extraction requires a time investment per binary file that does not scale well to the daily volume of malware instances being reported by those who diligently collect malware. In order to address the first challenge, this article proposed a binary-to-image projection algorithm based on a new type of feature extraction for the malware, was introduced in [2]. To address the second challenge, the technique's scalability is demonstrated through an implementation for the distributed (Key, Value) abstraction in cloud computing environment. Both theoretical and empirical evidence demonstrate its effectiveness over other state-of-the-art malware detection techniques on malware corpus, and the proposed method could be a useful and efficient complement to dynamic analysis.
Keywords: cloud computing; invasive software; learning (artificial intelligence); automated malware detection; binary-to-image projection algorithm; cloud computing environment; distributed malware detection; dynamic analysis; feature extraction; machine learning; malware attacks; malware binary executable file features; time investment; Arrays; Entropy; Feature extraction; Malware; Real-time systems; Vectors; Data Mining; Distributed Entropy LSH; Malware Detection; Malware Images (ID#: 15-4927)
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6852896&isnumber=6852105

Chia-Mei Chen; Je-Ming Lin; Gu-Hsin Lai, "Detecting Mobile Application Malicious Behaviors Based on Data Flow of Source Code," Trustworthy Systems and their Applications (TSA), 2014 International Conference on, pp.1,6, 9-10 June 2014. doi: 10.1109/TSA.2014.10 Abstract: Mobile devices have become powerful and popular. Most Internet applications are ported to mobile platform. Confidential personal information such as credit card and passwords are stored in mobile device for convenience. Therefore, mobile devices become the attack targets due to financial gain. Mobile applications are published in many market platforms without verification, hence malicious mobile applications can be deployed in such marketplaces. Two approaches for detecting malware, dynamic and static analysis, are commonly used in the literature. Dynamic analysis requires is that analyst run suspicious apps in a controlled environment to observe the behavior of apps to determine if the app is malicious or not. However, Dynamic analysis is time consuming, as some mobile application might be triggered after certain amount of time or special input sequence. In this paper static analysis is adopted to detect mobile malware and sensitive information is tracked to check if it is been released or used by malicious malware. In this paper, we present a mobile malware detection approach which is based on data flow of the reversed source code of the application. The proposed system tracks the data flow to detect and identify malicious behavior of malware in Android system. To validate the performance of proposed system, 252 malware form 19 families and 50 free apps from Google Play are used. The results proved that our method can successfully detecting malicious behaviours of Android APPs with the TPR 91.6%.
Keywords: Android (operating system); data flow analysis; invasive software; mobile computing; source code (software); Android APP; Google Play; Internet applications; TPR; confidential personal information storage; controlled environment; data flow; dynamic analysis; malware malicious behavior detection; malware malicious behavior identification; market platforms; mobile application malicious behavior detection; mobile devices; mobile malware detection approach; mobile platform; performance evaluation; reversed source code; sensitive information tracking; source code; static analysis; Androids; Humanoid robots; Malware; Mobile communication; Smart phones; Software (ID#: 15-4928)
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6956704&isnumber=6956693

Wenjun Hu; Jing Tao; Xiaobo Ma; Wenyu Zhou; Shuang Zhao; Ting Han, "MIGDroid: Detecting APP-Repackaging Android Malware via Method Invocation Graph," Computer Communication and Networks (ICCCN), 2014 23rd International Conference on, pp. 1,7, 4-7 Aug. 2014. doi: 10.1109/ICCCN.2014.6911805 Abstract: With the increasing popularity of Android platform, Android malware, especially APP-Repackaging malware wherein the malicious code is injected into legitimate Android applications, is spreading rapidly. This paper proposes a new system named MIGDroid, which leverages method invocation graph based static analysis to detect APP-Repackaging Android malware. The method invocation graph reflects the “interaction” connections between different methods. Such graph can be naturally exploited to detect APP-Repackaging malware because the connections between injected malicious code and legitimate applications are expected to be weak. Specifically, MIGDroid first constructs method invocation graph on the smali code level, and then divides the method invocation graph into weakly connected sub-graphs. To determine which sub-graph corresponds to the injected malicious code, the threat score is calculated for each sub-graph based on the invoked sensitive APIs, and the subgraphs with higher scores will be more likely to be malicious. Experiment results based on 1,260 Android malware samples in the real world demonstrate the specialty of our system in detecting APP-Repackaging Android malware, thereby well complementing existing static analysis systems (e.g., Androguard) that do not focus on APP-Repackaging Android malware.
Keywords: Android (operating system); graph theory; invasive software; Android applications; Android malware samples; Android platform; MIGDroid; connected subgraphs; detecting APP-Repackaging Android malware; injected malicious code; invocation graph method; threat score; Androids; Google; Humanoid robots; Receivers; Trojan horses; Android; malware; method invocation graph; static analysis (ID#: 15-4929)
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6911805&isnumber=6911704

Min Zheng; Mingshen Sun; Lui, J.C.S., "DroidTrace: A Ptrace Based Android Dynamic Analysis System With Forward Execution Capability," Wireless Communications and Mobile Computing Conference (IWCMC), 2014 International, pp. 128,133, 4-8 Aug. 2014. doi: 10.1109/IWCMC.2014.6906344 Abstract: Android, being an open source smartphone operating system, enjoys a large community of developers who create new mobile services and applications. However, it also attracts malware writers to exploit Android devices in order to distribute malicious apps in the wild. In fact, Android malware are becoming more sophisticated and they use advanced “dynamic loading” techniques like Java reflection or native code execution to bypass security detection. To detect dynamic loading, one has to use dynamic analysis. Currently, there are only a handful of Android dynamic analysis tools available, and they all have shortcomings in detecting dynamic loading. The aim of this paper is to design and implement a dynamic analysis system which allows analysts to perform systematic analysis of dynamic payloads with malicious behaviors. We propose “DroidTrace”, a ptrace based dynamic analysis system with forward execution capability. Our system uses ptrace to monitor selected system calls of the target process which is running the dynamic payloads, and classifies the payloads behaviors through the system call sequence, e.g., behaviors such as file access, network connection, inter-process communication and even privilege escalation. Also, DroidTrace performs “physical modification” to trigger different dynamic loading behaviors within an app. Using DroidTrace, we carry out a large scale analysis on 36,170 dynamic payloads in 50,000 apps and 294 malware in 10 families (four of them are zero-day) with various dynamic loading behaviors.
Keywords: Android (operating system); Java; invasive software; mobile computing; program diagnostics; public domain software; Android malware; DroidTrace; Java reflection; dynamic loading detection; dynamic payload analysis; file access; forward execution capability; interprocess communication; malicious apps; malicious behaviors; mobile applications; mobile services; native code execution; network connection; open source smartphone operating system; physical modification; privilege escalation; ptrace based Android dynamic analysis system; security detection system call monitoring; Androids; Humanoid robots; Java; Loading; Malware; Monitoring; Payloads (ID#: 15-4930)
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6906344&isnumber=6906315

Kumar, S.; Rama Krishna, C.; Aggarwal, N.; Sehgal, R.; Chamotra, S., "Malicious Data Classification Using Structural Information and Behavioral Specifications in Executables," Engineering and Computational Sciences (RAECS), 2014 Recent Advances in, pp. 1,6, 6-8 March 2014. doi: 10.1109/RAECS.2014.6799525 Abstract: With the rise in the underground Internet economy, automated malicious programs popularly known as malwares have become a major threat to computers and information systems connected to the internet. Properties such as self healing, self hiding and ability to deceive the security devices make these software hard to detect and mitigate. Therefore, the detection and the mitigation of such malicious software is a major challenge for researchers and security personals. The conventional systems for the detection and mitigation of such threats are mostly signature based systems. Major drawback of such systems are their inability to detect malware samples for which there is no signature available in their signature database. Such malwares are known as zero day malware. Moreover, more and more malware writers uses obfuscation technology such as polymorphic and metamorphic, packing, encryption, to avoid being detected by antivirus. Therefore, the traditional signature based detection system is neither effective nor efficient for the detection of zero-day malware. Hence to improve the effectiveness and efficiency of malware detection system we are using classification method based on structural information and behavioral specifications. In this paper we have used both static and dynamic analysis approaches. In static analysis we are extracting the features of an executable file followed by classification. In dynamic analysis we are taking the traces of executable files using NtTrace within controlled atmosphere. Experimental results obtained from our algorithm indicate that our proposed algorithm is effective in extracting malicious behavior of executables. Further it can also be used to detect malware variants.
Keywords: Internet; invasive software; pattern classification; program diagnostics; NtTrace; antivirus; automated malicious programs; behavioral specifications; dynamic analysis; executable file; information systems; malicious behavior extraction; malicious data classification; malicious software detection; malicious software mitigation; malware detection system effectiveness improvement; malware detection system efficiency improvement; malwares; obfuscation technology; security devices; signature database; signature-based detection system; static analysis; structural information; threat detection; threat mitigation; underground Internet economy; zero-day malware detection; Algorithm design and analysis; Classification algorithms; Feature extraction; Internet; Malware; Software; Syntactics; behavioral specifications; classification algorithms; dynamic analysis; malware detection; static analysis; system call (ID#: 15-4931)
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6799525&isnumber=6799496

Cen, L.; Gates, C.; Si, L.; Li, N., "A Probabilistic Discriminative Model for Android Malware Detection with Decompiled Source Code," Dependable and Secure Computing, IEEE Transactions on, vol.12, no.4, pp.400-412, July-August 1 2015. doi: 10.1109/TDSC.2014.2355839 Abstract: Mobile devices are an important part of our everyday lives, and the Android platform has become a market leader. In recent years a number of approaches for Android malware detection have been proposed, using permissions, source code analysis, or dynamic analysis. In this paper, we propose to use a probabilistic discriminative model based on regularized logistic regression for Android malware detection. Through extensive experimental evaluation, we demonstrate that it can generate probabilistic outputs with highly accurate classification results. In particular, we propose to use Android API calls as features extracted from decompiled source code, and analyze and explore issues in feature granularity, feature representation, feature selection, and regularization. We show that the probabilistic discriminative model also works well with permissions, and substantially outperforms the state-of-the-art methods for Android malware detection with application permissions. Furthermore, the discriminative learning model achieves the best detection results by combining both decompiled source code and application permissions. To the best of our knowledge, this is the first research that proposes probabilistic discriminative model for Android malware detection with a thorough study of desired representation of decompiled source code and is the first research work for Android malware detection task that combines both analysis of decompiled source code and application permissions.
Keywords: Androids; Feature extraction; Humanoid robots; Malware; Measurement; Probabilistic logic; Smart phones (ID#: 15-4932)
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6894210&isnumber=4358699

Haltas, F.; Uzun, E.; Siseci, N.; Posul, A.; Emre, B., "An Automated Bot Detection System Through Honeypots for Large-Scale," Cyber Conflict (CyCon 2014), 2014 6th International Conference on, pp.255,270, 3-6 June 2014. doi: 10.1109/CYCON.2014.6916407 Abstract: One of the purposes of active cyber defense systems is identifying infected machines in enterprise networks that are presumably root cause and main agent of various cyber-attacks. To achieve this, researchers have suggested many detection systems that rely on host-monitoring techniques and require deep packet inspection or which are trained by malware samples by applying machine learning and clustering techniques. To our knowledge, most approaches are either lack of being deployed easily to real enterprise networks, because of practicability of their training system which is supposed to be trained by malware samples or dependent to host-based or deep packet inspection analysis which requires a big amount of storage capacity for an enterprise. Beside this, honeypot systems are mostly used to collect malware samples for analysis purposes and identify coming attacks. Rather than keeping experimental results of bot detection techniques as theory and using honeypots for only analysis purposes, in this paper, we present a novel automated bot-infected machine detection system BFH (BotFinder through Honeypots), based on BotFinder, that identifies infected hosts in a real enterprise network by learning approach. Our solution, relies on NetFlow data, is capable of detecting bots which are infected by most-recent malwares whose samples are caught via 97 different honeypot systems. We train BFH by created models, according to malware samples, provided and updated by 97 honeypot systems. BFH system automatically sends caught malwares to classification unit to construct family groups. Later, samples are automatically given to training unit for modeling and perform detection over NetFlow data. Results are double checked by using full packet capture of a month and through tools that identify rogue domains. Our results show that BFH is able to detect infected hosts with very few false-positive rates and successful on handling most-recent malware families since it is fed by 97 Honey- ot and it supports large networks with scalability of Hadoop infrastructure, as deployed in a large-scale enterprise network in Turkey.
Keywords: invasive software; learning (artificial intelligence); parallel processing; pattern clustering; BFH; Hadoop infrastructure; NetFlow data; active cyber defense systems; automated bot detection system; bot detection techniques; bot-infected machine detection system; botfinder through honeypots; clustering technique; cyber-attacks; deep packet inspection; enterprise networks; honeypot systems; host-monitoring techniques; learning approach; machine learning technique; malware; Data models; Feature extraction; Malware; Monitoring; Scalability; Training; Botnet; NetFlow analysis; honeypots; machine learning (ID#: 15-4933)
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6916407&isnumber=6916383

Saleh, M.; Ratazzi, E.P.; Shouhuai Xu, "Instructions-Based Detection of Sophisticated Obfuscation and Packing," Military Communications Conference (MILCOM), 2014 IEEE, pp. 1, 6, 6-8 Oct. 2014. doi: 10.1109/MILCOM.2014.9 Abstract: Every day thousands of malware are released online. The vast majority of these malware employ some kind of obfuscation ranging from simple XOR encryption, to more sophisticated anti-analysis, packing and encryption techniques. Dynamic analysis methods can unpack the file and reveal its hidden code. However, these methods are very time consuming when compared to static analysis. Moreover, considering the large amount of new malware being produced daily, it is not practical to solely depend on dynamic analysis methods. Therefore, finding an effective way to filter the samples and delegate only obfuscated and suspicious ones to more rigorous tests would significantly improve the overall scanning process. Current techniques of identifying obfuscation rely mainly on signatures of known packers, file entropy score, or anomalies in file header. However, these features are not only easily bypass-able, but also do not cover all types of obfuscation. In this paper, we introduce a novel approach to identify obfuscated files based on anomalies in their instructions-based characteristics. We detect the presence of interleaving instructions which are the result of the opaque predicate anti-disassembly trick, and present distinguishing statistical properties based on the opcodes and control flow graphs of obfuscated files. Our detection system combines these features with other file structural features and leads to a very good result of detecting obfuscated malware.
Keywords: invasive software; control flow graphs; dynamic analysis methods; encryption techniques; file entropy score; file header anomaly; instructions-based detection; malware detection; obfuscated file identification; obfuscation detection; opcodes; packing detection; simple XOR encryption; static analysis methods; Electronic mail; Encryption; Entropy; Feature extraction; Malware; Reverse engineering (ID#: 15-4934)
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6956729&isnumber=6956719

Prakash, A.; Venkataramani, E.; Yin, H.; Lin, Z., "On the Trustworthiness of Memory Analysis—An Empirical Study from the Perspective of Binary Execution," Dependable and Secure Computing, IEEE Transactions on, vol.12, no.5, pp.557-570, Sept.-Oct. 1 2015. doi: 10.1109/TDSC.2014.2366464 Abstract: Memory analysis serves as a foundation for many security applications such as memory forensics, virtual machine introspection and malware investigation. However, malware, or more specifically a kernel rootkit, can often tamper with kernel memory data, putting the trustworthiness of memory analysis under question. With the rapid deployment of cloud computing and increase of cyberattacks, there is a pressing need to systematically study and understand the problem of memory analysis. In particular, without ground truth, the quality of the memory analysis tools widely used for analyzing closed-source operating systems (like Windows) has not been thoroughly studied. Moreover, while it is widely accepted that value manipulation attacks pose a threat to memory analysis, its severity has not been explored and well understood. To answer these questions, we have devised a number of novel analysis techniques including (1) binary level ground-truth collection, and (2) value equivalence set directed field mutation. Our experimental results demonstrate not only that the existing tools are inaccurate even under a non-malicious context, but also that value manipulation attacks are practical and severe. Finally, we show that exploiting information redundancy can be a viable direction to mitigate value manipulation attacks, but checking information equivalence alone is not an ultimate solution.
Keywords: Context; Data structures; Kernel; Robustness; Security; Semantics; Virtual machining; DKOM; Invasive Software; Kernel Rootkit; Memory Forensics; Operating Systems Security; Virtual Machine Introspection (ID#: 15-4935)
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6942280&isnumber=4358699

Okane, P.; Sezer, S.; McLaughlin, K.; Eul Gyu Im, "Malware Detection: Program Run Length Against Detection Rate," Software, IET, vol. 8, no.1, pp.42, 51, February 2014. doi: 10.1049/iet-sen.2013.0020 Abstract: N-gram analysis is an approach that investigates the structure of a program using bytes, characters or text strings. This research uses dynamic analysis to investigate malware detection using a classification approach based on N-gram analysis. A key issue with dynamic analysis is the length of time a program has to be run to ensure a correct classification. The motivation for this research is to find the optimum subset of operational codes (opcodes) that make the best indicators of malware and to determine how long a program has to be monitored to ensure an accurate support vector machine (SVM) classification of benign and malicious software. The experiments within this study represent programs as opcode density histograms gained through dynamic analysis for different program run periods. A SVM is used as the program classifier to determine the ability of different program run lengths to correctly determine the presence of malicious software. The findings show that malware can be detected with different program run lengths using a small number of opcodes.
Keywords: invasive software; pattern classification; runlength codes; support vector machines; system monitoring; N-gram analysis; SVM classification; benign software; detection rate; dynamic analysis; malicious software; malware detection; opcode density histograms; operational codes; program classifier; program monitoring time; program run length; support vector machine (ID#: 15-4936)
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6720049&isnumber=6720044

Euijin Choo; Younghee Park; Siyamwala, H., "Identifying Malicious Metering Data in Advanced Metering Infrastructure," Service Oriented System Engineering (SOSE), 2014 IEEE 8th International Symposium on, pp. 490, 495, 7-11 April 2014. doi: 10.1109/SOSE.2014.75 Abstract: Advanced Metering Infrastructure (AMI) has evolved to measure and control energy usage in communicating through metering devices. However, the development of the AMI network brings with it security issues, including the increasingly serious risk of malware in the new emerging network. Malware is often embedded in the data payloads of legitimate metering data. It is difficult to detect malware in metering devices, which are resource constrained embedded systems, during time-critical communications. This paper describes a method in order to distinguish malware-bearing traffic and legitimate metering data using a disassembler and statistical analysis. Based on the discovered unique characteristic of each data type, the proposed method detects malicious metering data. (i.e. malware-bearing data). The analysis of data payloads is statistically performed while investigating a distribution of instructions in traffic by using a disassembler. Doing so demonstrates that the distribution of instructions in metering data is significantly different from that in malware-bearing data. The proposed approach successfully identifies the two different types of data with complete accuracy, with 0% false positives and 0% false negatives.
Keywords: invasive software; metering; power system security; program assemblers; smart meters; statistical analysis; AMI network; advanced metering infrastructure; data payloads; disassembler; energy usage; malicious metering data; malware-bearing data; malware-bearing traffic; metering devices; resource constrained embedded systems; security issues; statistical analysis; time-critical communications; Malware; Registers; Statistical analysis; Testing; Training; ARM Instructions; Advanced Metering Infrastructure; Diassembler; Malware; Security; Smart Meters (ID#: 15-4937)
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6830954&isnumber=6825948

Hui Zhu; Cheng Huang; Hui Li, "MPPM: Malware Propagation and Prevention Model in Online SNS," Communications Workshops (ICC), 2014 IEEE International Conference on pp.682, 687, 10-14 June 2014. doi: 10.1109/ICCW.2014.6881278 Abstract: With the pervasiveness of online social network service (SNS), many people express their views and share information with others on them, and the information propagation model of online SNS has attracted considerable interest recently. However, the flourish of information propagation model in online SNS still faces many challenges, especially considering more and more malicious software's propagation in SNS. In this paper, we proposed a malware propagation and prevention model based on the propagation probability model, called MPPM, for online SNS. With this model, we can describe the relationships among malware propagation, habits of users and malware detection in online SNS. In specific, based on characteristics of online SNS, we define users' states and the rules of malware propagation with dynamics of infectious disease; then, we introduce the detection factor to affect the propagation of malwares, and present the malwares propagation and prevention in online SNS by dynamic evolution equations; finally, we analyze the factors which influence the malware propagation in online SNS. Detailed analysis and simulation demonstrate that the MPPM model can precisely describe the process of malware's propagation and prevention in online SNS.
Keywords: invasive software; probability; social networking (online); ubiquitous computing; MPPM model; dynamic evolution equations; infectious disease dynamics; information propagation model; malicious software propagation; malware detection; malware propagation and prevention model; online SNS pervasiveness; online social network service pervasiveness; propagation probability model; Analytical models; Computational modeling; Conferences; Malware; Mathematical model; Social network services; Social network service; dynamic evolution equations; dynamics of infectious disease; malware prevention (ID#: 15-4938)
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6881278&isnumber=6881162

O'Kane, P.; Sezer, S.; McLaughlin, K., "N-gram Density Based Malware Detection," Computer Applications & Research (WSCAR), 2014 World Symposium on, pp.1, 6, 18-20 Jan. 2014. doi: 10.1109/WSCAR.2014.6916806 Abstract: N-gram analysis is an approach that investigates the structure of a program using bytes, characters or text strings. This research uses dynamic analysis to investigate malware detection using a classification approach based on N-gram analysis. The motivation for this research is to find a subset of N-gram features that makes a robust indicator of malware. The experiments within this paper represent programs as N-gram density histograms, gained through dynamic analysis. A Support Vector Machine (SVM) is used as the program classifier to determine the ability of N-grams to correctly determine the presence of malicious software. The preliminary findings show that an N-gram size N=3 and N=4 present the best avenues for further analysis.
Keywords: invasive software; pattern classification; support vector machines; N-gram analysis; N-gram density histograms; SVM; classification approach; malware detection; support vector machine; Information technology; Malware; Support vector machines; Three-dimensional displays; Malware; N-gram; Support Vector Machine (ID#: 15-4939)
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6916806&isnumber=6916766

Yerima, Suleiman Y.; Sezer, Sakir; Muttik, Igor, "Android Malware Detection Using Parallel Machine Learning Classifiers," Next Generation Mobile Apps, Services and Technologies (NGMAST), 2014 Eighth International Conference on, pp. 37,42, 10-12 Sept. 2014. doi: 10.1109/NGMAST.2014.23 Abstract: Mobile malware has continued to grow at an alarming rate despite on-going mitigation efforts. This has been much more prevalent on Android due to being an open platform that is rapidly overtaking other competing platforms in the mobile smart devices market. Recently, a new generation of Android malware families has emerged with advanced evasion capabilities which make them much more difficult to detect using conventional methods. This paper proposes and investigates a parallel machine learning based classification approach for early detection of Android malware. Using real malware samples and benign applications, a composite classification model is developed from parallel combination of heterogeneous classifiers. The empirical evaluation of the model under different combination schemes demonstrates its efficacy and potential to improve detection accuracy. More importantly, by utilizing several classifiers with diverse characteristics, their strengths can be harnessed not only for enhanced Android malware detection but also quicker white box analysis by means of the more interpretable constituent classifiers.
Keywords: Accuracy; Androids; Classification algorithms; Feature extraction; Humanoid robots; Malware; Training; Android; data mining; machine learning; malware detection; mobile security; parallel classifiers; static analysis (ID#: 15-4940)
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6982888&isnumber=6982871

Cobb, S.; Lee, A., "Malware is Called Malicious for a Reason: The Risks of Weaponizing Code," Cyber Conflict (CyCon 2014), 2014 6th International Conference on, pp.71, 84, 3-6 June 2014. doi: 10.1109/CYCON.2014.6916396 Abstract: The allure of malware, with its tremendous potential to infiltrate and disrupt digital systems, is understandable. Criminally motivated malware is now directed at all levels and corners of the cyber domain, from servers to endpoints, laptops, smartphones, tablets, and industrial control systems. A thriving underground industry today produces ever-increasing quantities of malware for a wide variety of platforms, which bad actors seem able to deploy with relative impunity. The urge to fight back with “good” malware is understandable. In this paper we review and assess the arguments for and against the use of malicious code for either active defense or direct offense. Our practical experiences analyzing and defending against malicious code suggest that the effect of deployment is hard to predict with accuracy. There is tremendous scope for unintended consequences and loss of control over the code itself. Criminals do not feel restrained by these factors and appear undeterred by moral dilemmas like collateral damage, but we argue that persons or entities considering the use of malware for “justifiable offense” or active defense need to fully understand the issues around scope, targeting, control, blowback, and arming the adversary. Using existing open source literature and commentary on this topic we review the arguments for and against the use of “malicious” code for “righteous” purposes, introducing the term “righteous malware”. We will cite select instances of prior malicious code deployment to reveal lessons learned for future missions. In the process, we will refer to a range of techniques employed by criminally-motivated malware authors to evade detection, amplify infection, leverage investment, and execute objectives that range from denial of service to information stealing, fraudulent, revenue generation, blackmail and surveillance. Examples of failure to retain control of criminally- motivated malicious code development will also be examined for what they may tell us about code persistence and life cycles. In closing, we will present our considered opinions on the risks of weaponizing code.
Keywords: computer crime; invasive software; public domain software; amplify infection; blackmail; criminal; cyber domain; distrupt digital system; evade detection; fraudulent; information stealing; leverage investment; open source literature; prior malicious code deployment; revenue generation; righteous malware; surveillance; weaponising code risk; Computers; Malware; National security; Software; Viruses (medical);Weapons; active defense; cyber conflict; malicious code; malware; weaponize (ID#: 15-4941)
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6916396&isnumber=6916383

Bou-Harb, E.; Fachkha, C.; Debbabi, M.; Assi, C., "Inferring Internet-Scale Infections by Correlating Malware and Probing Activities," Communications (ICC), 2014 IEEE International Conference on, pp. 640, 646, 10-14 June 2014. doi: 10.1109/ICC.2014.6883391 Abstract: This paper presents a new approach to infer malware-infected machines by solely analyzing their generated probing activities. In contrary to other adopted methods, the proposed approach does not rely on symptoms of infection to detect compromised machines. This allows the inference of malware infection at very early stages of contamination. The approach aims at detecting whether the machines are infected or not as well as pinpointing the exact malware type/family, if the machines were found to be compromised. The latter insights allow network security operators of diverse organizations, Internet service providers and backbone networks to promptly detect their clients' compromised machines in addition to effectively providing them with tailored anti-malware/patch solutions. To achieve the intended goals, the proposed approach exploits the darknet Internet space and employs statistical methods to infer large-scale probing activities. Subsequently, such activities are correlated with malware samples by leveraging fuzzy hashing and entropy based techniques. The proposed approach is empirically evaluated using 60 GB of real darknet traffic and 65 thousand real malware samples. The results concur that the rationale of exploiting probing activities for worldwide early malware infection detection is indeed very promising. Further, the results demonstrate that the extracted inferences exhibit noteworthy accuracy and can generate significant cyber security insights that could be used for effective mitigation.
Keywords: Internet; computer network security; cryptography; entropy; fuzzy reasoning; invasive software; Internet scale infections; darknet Internet; entropy based techniques; fuzzy hashing; inference; malware infection detection; network security operators; probing activities; Correlation; Entropy; Internet ;Malware; Unsolicited electronic mail (ID#: 15-4942)
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6883391&isnumber=6883277

Musavi, S.A.; Kharrazi, M., "Back to Static Analysis for Kernel-Level Rootkit Detection," Information Forensics and Security, IEEE Transactions on, vol.9, no.9, pp.1465, 1476, Sept. 2014. doi: 10.1109/TIFS.2014.2337256 Abstract: Rootkit's main goal is to hide itself and other modules present in the malware. Their stealthy nature has made their detection further difficult, especially in the case of kernel-level rootkits. There have been many dynamic analysis techniques proposed for detecting kernel-level rootkits, while on the other hand, static analysis has not been popular. This is perhaps due to its poor performance in detecting malware in general, which could be attributed to the level of obfuscation employed in binaries which make static analysis difficult if not impossible. In this paper, we make two important observations, first there is usually little obfuscation used in legitimate kernel-level code, as opposed to the malicious kernel-level code. Second, one of the main approaches to penetrate the Windows operating system is through kernel-level drivers. Therefore, by focusing on detecting malicious kernel drivers employed by the rootkit, one could detect the rootkit while avoiding the issues with current detection technique. Given these two observation, we propose a simple static analysis technique with the aim of detecting malicious driver. We first study the current trends in the implementation of kernel-level rootkits. Afterward, we proposed a set of features to quantify the malicious behavior in kernel drivers. These features are then evaluated through a set of experiments on 4420 malicious and legitimate drivers, obtaining an accuracy of 98.15% in distinguishing between these drivers.
Keywords: device drivers; invasive software; operating system kernels; program diagnostics; Windows operating system; dynamic analysis techniques; kernel-level code; kernel-level drivers; kernel-level rootkit detection; malicious driver detection; malicious kernel-level code; malware; obfuscation level; static analysis; Feature extraction; Hardware; Kernel; Malware; Market research; Malware; kernel driver; rootkit; static analysis (ID#: 15-4943)
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6850033&isnumber=6867417

Mohaisen, Aziz; West, Andrew G.; Mankin, Allison; Alrawi, Omar, "Chatter: Classifying Malware Families Using System Event Ordering," Communications and Network Security (CNS), 2014 IEEE Conference on, pp. 283, 291, 29-31 Oct. 2014. doi: 10.1109/CNS.2014.6997496 Abstract: Using runtime execution artifacts to identify malware and its associated “family” is an established technique in the security domain. Many papers in the literature rely on explicit features derived from network, file system, or registry interaction. While effective, use of these fine-granularity data points makes these techniques computationally expensive. Moreover, the signatures and heuristics this analysis produces are often circumvented by subsequent malware authors. To this end we propose CHATTER, a system that is concerned only with the order in which high-level system events take place. Individual events are mapped onto an alphabet and execution traces are captured via terse concatenations of those letters. Then, leveraging an analyst labeled corpus of malware, n-gram document classification techniques are applied to produce a classifier predicting malware family. This paper describes that technique and its proof-of-concept evaluation. In its prototype form only network events are considered and three malware families are highlighted. We show the technique achieves roughly 80% accuracy in isolation and makes non-trivial performance improvements when integrated with a baseline classifier of non-ordered features (with an accuracy of roughly 95%).
Keywords: Accuracy; Decision trees; Feature extraction; Machine learning algorithms; Malware; Support vector machines (ID#: 15-4944)
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6997496&isnumber=6997445

Zongqu Zhao; Junfeng Wang; Jinrong Bai, "Malware Detection Method Based on the Control-Flow Construct Feature of Software," Information Security, IET, vol.8, no. 1, pp.18, 24, Jan. 2014. doi: 10.1049/iet-ifs.2012.0289 Abstract: The existing anti-virus methods extract signatures of software by manual analysis. It is inefficient when they deal with a large number of malware. Meanwhile, the limitation of unknown malware detection often is found in them too. By the research on software structure, it has been found that the control flow of software can be divided into many basic blocks by the interior cross-references, and a feature-selection approach based on this phenomenon is proposed. It can extract opcode sequences from the disassembled program, and translate them into features by vector space model. The algorithms of data mining are employed to find the classify rules from the software features, and then the rules can be applied to the malware detection. Experimental results illustrate that the proposed method can achieve the 97.0% malware detection accuracy and 3.2% false positive rate with the Random Forest classifier. Furthermore, as high as 94.5% overall accuracy can be achieved when only 5% experimental data are used as training data.
Keywords: data mining; invasive software; learning (artificial intelligence);pattern classification; anti-virus methods; control-flow construct feature; data mining; disassembled program; feature-selection approach; interior cross-references; malware detection method; opcode sequences; random forest classifier; software structure; vector space model (ID#: 15-4945)
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6687154&isnumber=6687150

Note:

Articles listed on these pages have been found on publicly available internet pages and are cited with links to those pages. Some of the information included herein has been reprinted with permission from the authors or data repositories. Direct any requests via Email to news@scienceofsecurity.net for removal of the links or modifications to specific citations. Please include the ID# of the specific citation in your correspondence.