Biblio
Malware threats often go undetected immediately, because attackers can camouflage well within the system. The users realize this after the devices stop working and cause harm for them. One way to deceive malicious content detection, malware authors use packers. Malware analysis is an activity to gain knowledge about malware. Reverse engineering is a technique used to identify and deal with new viruses or to understand malware behavior. Therefore, this technique can be the right choice for conducting malware analysis, especially for malware with packers. The results of the analysis are used as a source for making creating indicator of compromise in the YARA rule format. YARA rule is used as a component for detecting malware using the indicators obtained in the analysis process.
Nowadays, Windows is an operating system that is very popular among people, especially users who have limited knowledge of computers. But unconsciously, the security threat to the windows operating system is very high. Security threats can be in the form of illegal exploitation of the system. The most common attack is using malware. To determine the characteristics of malware using dynamic analysis techniques and static analysis is very dependent on the availability of malware samples. Honeypot is the most effective malware collection technique. But honeypot cannot determine the type of file format contained in malware. File format information is needed for the purpose of handling malware analysis that is focused on windows-based malware. For this reason, we propose a framework that can collect malware information as well as identify malware PE file type formats. In this study, we collected malware samples using a modern honey network. Next, we performed a feature extraction to determine the PE file format. Then, we classify types of malware using VirusTotal scanning. As the results of this study, we managed to get 1.222 malware samples. Out of 1.222 malware samples, we successfully extracted 945 PE malware. This study can help researchers in other research fields, such as machine learning and deep learning, for malware detection.
Mobile and IoT operating systems–and their ensuing software updates–are usually distributed as binary files. Given that these binary files are commonly closed source, users or businesses who want to assess the security of the software need to rely on reverse engineering. Further, verifying the correct application of the latest software patches in a given binary is an open problem. The regular application of software patches is a central pillar for improving mobile and IoT device security. This requires developers, integrators, and vendors to propagate patches to all affected devices in a timely and coordinated fashion. In practice, vendors follow different and sometimes improper security update agendas for both mobile and IoT products. Moreover, previous studies revealed the existence of a hidden patch gap: several vendors falsely reported that they patched vulnerabilities. Therefore, techniques to verify whether vulnerabilities have been patched or not in a given binary are essential. Deep learning approaches have shown to be promising for static binary analyses with respect to inferring binary similarity as well as vulnerability detection. However, these approaches fail to capture the dynamic behavior of these systems, and, as a result, they may inundate the analysis with false positives when performing vulnerability discovery in the wild. In particular, they cannot capture the fine-grained characteristics necessary to distinguish whether a vulnerability has been patched or not. In this paper, we present PATCHECKO, a vulnerability and patch presence detection framework for executable binaries. PATCHECKO relies on a hybrid, cross-platform binary code similarity analysis that combines deep learning-based static binary analysis with dynamic binary analysis. PATCHECKO does not require access to the source code of the target binary nor that of vulnerable functions. We evaluate PATCHECKO on the most recent Google Pixel 2 smartphone and the Android Things IoT firmware images, within which 25 known CVE vulnerabilities have been previously reported and patched. Our deep learning model shows a vulnerability detection accuracy of over 93%. We further prune the candidates found by the deep learning stage–which includes false positives–via dynamic binary analysis. Consequently, PATCHECKO successfully identifies the correct matches among the candidate functions in the top 3 ranked outcomes 100% of the time. Furthermore, PATCHECKO's differential engine distinguishes between functions that are still vulnerable and those that are patched with an accuracy of 96%.
With the rapid development of the mobile Internet, Android has been the most popular mobile operating system. Due to the open nature of Android, c countless malicious applications are hidden in a large number of benign applications, which pose great threats to users. Most previous malware detection approaches mainly rely on features such as permissions, API calls, and opcode sequences. However, these approaches fail to capture structural semantics of applications. In this paper, we propose AMDroid that leverages function call graphs (FCGs) representing the behaviors of applications and applies graph kernels to automatically learn the structural semantics of applications from FCGs. We evaluate AMDroid on the Genome Project, and the experimental results show that AMDroid is effective to detect Android malware with 97.49% detection accuracy.
IoT malware detection using control flow graph (CFG)-based features and deep learning networks are widely explored. The main goal of this study is to investigate the robustness of such models against adversarial learning. We designed two approaches to craft adversarial IoT software: off-the-shelf methods and Graph Embedding and Augmentation (GEA) method. In the off-the-shelf adversarial learning attack methods, we examine eight different adversarial learning methods to force the model to misclassification. The GEA approach aims to preserve the functionality and practicality of the generated adversarial sample through a careful embedding of a benign sample to a malicious one. Intensive experiments are conducted to evaluate the performance of the proposed method, showing that off-the-shelf adversarial attack methods are able to achieve a misclassification rate of 100%. In addition, we observed that the GEA approach is able to misclassify all IoT malware samples as benign. The findings of this work highlight the essential need for more robust detection tools against adversarial learning, including features that are not easy to manipulate, unlike CFG-based features. The implications of the study are quite broad, since the approach challenged in this work is widely used for other applications using graphs.
Malware scanning of an app market is expected to be scalable and effective. However, existing approaches use either syntax-based features which can be evaded by transformation attacks or semantic-based features which are usually extracted by performing expensive program analysis. Therefor, in this paper, we propose a lightweight graph-based approach to perform Android malware detection. Instead of traditional heavyweight static analysis, we treat function call graphs of apps as social networks and perform social-network-based centrality analysis to represent the semantic features of the graphs. Our key insight is that centrality provides a succinct and fault-tolerant representation of graph semantics, especially for graphs with certain amount of inaccurate information (e.g., inaccurate call graphs). We implement a prototype system, MalScan, and evaluate it on datasets of 15,285 benign samples and 15,430 malicious samples. Experimental results show that MalScan is capable of detecting Android malware with up to 98% accuracy under one second which is more than 100 times faster than two state-of-the-art approaches, namely MaMaDroid and Drebin. We also demonstrate the feasibility of MalScan on market-wide malware scanning by performing a statistical study on over 3 million apps. Finally, in a corpus of dataset collected from Google-Play app market, MalScan is able to identify 18 zero-day malware including malware samples that can evade detection of existing tools.
The main challenge for malware researchers is the large amount of data and files that need to be evaluated for potential threats. Researchers analyze a large number of new malware daily and classify them in order to extract common features. Therefore, a system that can ensure and improve the efficiency and accuracy of the classification is of great significance for the study of malware characteristics. A high-performance, high-efficiency automatic classification system based on multi-feature selection fusion of machine learning is proposed in this paper. Its performance and efficiency, according to our experiments, have been greatly improved compared to single-featured systems.
This paper presents TrustSign, a novel, trusted automatic malware signature generation method based on high-level deep features transferred from a VGG-19 neural network model pre-trained on the ImageNet dataset. While traditional automatic malware signature generation techniques rely on static or dynamic analysis of the malware's executable, our method overcomes the limitations associated with these techniques by producing signatures based on the presence of the malicious process in the volatile memory. Signatures generated using TrustSign well represent the real malware behavior during runtime. By leveraging the cloud's virtualization technology, TrustSign analyzes the malicious process in a trusted manner, since the malware is unaware and cannot interfere with the inspection procedure. Additionally, by removing the dependency on the malware's executable, our method is capable of signing fileless malware. Thus, we focus our research on in-browser cryptojacking attacks, which current antivirus solutions have difficulty to detect. However, TrustSign is not limited to cryptojacking attacks, as our evaluation included various ransomware samples. TrustSign's signature generation process does not require feature engineering or any additional model training, and it is done in a completely unsupervised manner, obviating the need for a human expert. Therefore, our method has the advantage of dramatically reducing signature generation and distribution time. The results of our experimental evaluation demonstrate TrustSign's ability to generate signatures invariant to the process state over time. By using the signatures generated by TrustSign as input for various supervised classifiers, we achieved 99.5% classification accuracy.
Cryptojacking is the permissionless use of a target device to covertly mine cryptocurrencies. With cryptojacking attackers use malicious JavaScript codes to force web browsers into solving proof-of-work puzzles, thus making money by exploiting resources of the website visitors. To understand and counter such attacks, we systematically analyze the static, dynamic, and economic aspects of in-browser cryptojacking. For static analysis, we perform content-, currency-, and code-based categorization of cryptojacking samples to 1) measure their distribution across websites, 2) highlight their platform affinities, and 3) study their code complexities. We apply unsupervised learning to distinguish cryptojacking scripts from benign and other malicious JavaScript samples with 96.4% accuracy. For dynamic analysis, we analyze the effect of cryptojacking on critical system resources, such as CPU and battery usage. Additionally, we perform web browser fingerprinting to analyze the information exchange between the victim node and the dropzone cryptojacking server. We also build an analytical model to empirically evaluate the feasibility of cryptojacking as an alternative to online advertisement. Our results show a large negative profit and loss gap, indicating that the model is economically impractical. Finally, by leveraging insights from our analyses, we build countermeasures for in-browser cryptojacking that improve upon the existing remedies.
This paper presents an overview of the H2020 project VESSEDIA [9] aimed at verifying the security and safety of modern connected systems also called IoT. The originality relies in using Formal Methods inherited from high-criticality applications domains to analyze the source code at different levels of intensity, to gather possible faults and weaknesses. The analysis methods are mostly exhaustive an guarantee that, after analysis, the source code of the application is error-free. This paper is structured as follows: after an introductory section 1 giving some factual data, section 2 presents the aims and the problems addressed; section 3 describes the project's use-cases and section 4 describes the proposed approach for solving these problems and the results achieved until now; finally, section 5 discusses some remaining future work.
With the rapid growth of Linux-based IoT devices such as network cameras and routers, the security becomes a concern and many attacks utilize vulnerabilities to compromise the devices. It is crucial for researchers to find vulnerabilities in IoT systems before attackers. Fuzzing is an effective vulnerability discovery technique for traditional desktop programs, but could not be directly applied to Linux-based IoT programs due to the special execution environment requirement. In our paper, we propose an efficient greybox fuzzing scheme for Linux-based IoT programs which consist of two phases: binary static analysis and IoT program greybox fuzzing. The binary static analysis is to help generate useful inputs for efficient fuzzing. The IoT program greybox fuzzing is to reinforce the IoT firmware kernel greybox fuzzer to support IoT programs. We implement a prototype system and the evaluation results indicate that our system could automatically find vulnerabilities in real-world Linux-based IoT programs efficiently.
Ransomware is currently one of the most significant cyberthreats to both national infrastructure and the individual, often requiring severe treatment as an antidote. Triaging ran-somware based on its similarity with well-known ransomware samples is an imperative preliminary step in preventing a ransomware pandemic. Selecting the most appropriate triaging method can improve the precision of further static and dynamic analysis in addition to saving significant t ime a nd e ffort. Currently, the most popular and proven triaging methods are fuzzy hashing, import hashing and YARA rules, which can ascertain whether, or to what degree, two ransomware samples are similar to each other. However, the mechanisms of these three methods are quite different and their comparative assessment is difficult. Therefore, this paper presents an evaluation of these three methods for triaging the four most pertinent ransomware categories WannaCry, Locky, Cerber and CryptoWall. It evaluates their triaging performance and run-time system performance, highlighting the limitations of each method.