Biblio
The rapid growth of Android malware apps poses a great security threat to users thus it is very important and urgent to detect Android malware effectively. What's more, the increasing unknown malware and evasion technique also call for novel detection method. In this paper, we focus on API feature and develop a novel method to detect Android malware. First, we propose a novel selection method for API feature related with the malware class. However, such API also has a legitimate use in benign app thus causing FP problem (misclassify benign as malware). Second, we further explore structure relationships between these APIs and map to a matrix interpreted as the hand-refined API-based feature graph. Third, a CNN-based classifier is developed for the API-based feature graph classification. Evaluations of a real-world dataset containing 3,697 malware apps and 3,312 benign apps demonstrate that selected API feature is effective for Android malware classification, just top 20 APIs can achieve high F1 of 94.3% under Random Forest classifier. When the available API features are few, classification performance including FPR indicator can achieve effective improvement effectively by complementing our further work.
The rapid growth of Android malware has posed severe security threats to smartphone users. On the basis of the familial trait of Android malware observed by previous work, the familial analysis is a promising way to help analysts better focus on the commonalities of malware samples within the same families, thus reducing the analytical workload and accelerating malware analysis. The majority of existing approaches rely on supervised learning and face three main challenges, i.e., low accuracy, low efficiency, and the lack of labeled dataset. To address these challenges, we first construct a fine-grained behavior model by abstracting the program semantics into a set of subgraphs. Then, we propose SRA, a novel feature that depicts the similarity relationships between the Structural Roles of sensitive API call nodes in subgraphs. An SRA is obtained based on graph embedding techniques and represented as a vector, thus we can effectively reduce the high complexity of graph matching. After that, instead of training a classifier with labeled samples, we construct malware link network based on SRAs and apply community detection algorithms on it to group the unlabeled samples into groups. We implement these ideas in a system called GefDroid that performs Graph embedding based familial analysis of AnDroid malware using unsupervised learning. Moreover, we conduct extensive experiments to evaluate GefDroid on three datasets with ground truth. The results show that GefDroid can achieve high agreements (0.707-0.883 in term of NMI) between the clustering results and the ground truth. Furthermore, GefDroid requires only linear run-time overhead and takes around 8.6s to analyze a sample on average, which is considerably faster than the previous work.
Malware scanning of an app market is expected to be scalable and effective. However, existing approaches use either syntax-based features which can be evaded by transformation attacks or semantic-based features which are usually extracted by performing expensive program analysis. Therefor, in this paper, we propose a lightweight graph-based approach to perform Android malware detection. Instead of traditional heavyweight static analysis, we treat function call graphs of apps as social networks and perform social-network-based centrality analysis to represent the semantic features of the graphs. Our key insight is that centrality provides a succinct and fault-tolerant representation of graph semantics, especially for graphs with certain amount of inaccurate information (e.g., inaccurate call graphs). We implement a prototype system, MalScan, and evaluate it on datasets of 15,285 benign samples and 15,430 malicious samples. Experimental results show that MalScan is capable of detecting Android malware with up to 98% accuracy under one second which is more than 100 times faster than two state-of-the-art approaches, namely MaMaDroid and Drebin. We also demonstrate the feasibility of MalScan on market-wide malware scanning by performing a statistical study on over 3 million apps. Finally, in a corpus of dataset collected from Google-Play app market, MalScan is able to identify 18 zero-day malware including malware samples that can evade detection of existing tools.
Android malware family classification is an advanced task in Android malware analysis, detection and forensics. Existing methods and models have achieved a certain success for Android malware detection, but the accuracy and the efficiency are still not up to the expectation, especially in the context of multiple class classification with imbalanced training data. To address those challenges, we propose an Android malware family classification model by analyzing the code's specific semantic information based on sensitive opcode sequence. In this work, we construct a sensitive semantic feature-sensitive opcode sequence using opcodes, sensitive APIs, STRs and actions, and propose to analyze the code's specific semantic information, generate a semantic related vector for Android malware family classification based on this feature. Besides, aiming at the families with minority, we adopt an oversampling technique based on the sensitive opcode sequence. Finally, we evaluate our method on Drebin dataset, and select the top 40 malware families for experiments. The experimental results show that the Total Accuracy and Average AUC (Area Under Curve, AUC) reach 99.50% and 98.86% with 45. 17s per Android malware, and even if the number of malware families increases, these results remain good.
The number of malicious Android apps has been and continues to increase rapidly. These malware can damage or alter other files or settings, install additional applications, obfuscate their behaviors, propagate quickly, and so on. To identify and handle such malware, a security analyst can significantly benefit from identifying the family to which a malicious app belongs rather than only detecting if an app is malicious. To address these challenges, we present a novel machine learning-based Android malware detection and family-identification approach, RevealDroid, that operates without the need to perform complex program analyses or extract large sets of features. RevealDroid's selected features leverage categorized Android API usage, reflection-based features, and features from native binaries of apps. We assess RevealDroid for accuracy, efficiency, and obfuscation resilience using a large dataset consisting of more than 54,000 malicious and benign apps. Our experiments show that RevealDroid achieves an accuracy of 98% in detection of malware and an accuracy of 95% in determination of their families. We further demonstrate RevealDroid's superiority against state-of-the-art approaches. [URL of original paper: https://dl.acm.org/citation.cfm?id=3162625]
Smartphones have evolved over the years from simple devices to communicate with each other to fully functional portable computers although with comparatively less computational power but inholding multiple applications within. With the smartphone revolution, the value of personal data has increased. As technological complexities increase, so do the vulnerabilities in the system. Smartphones are the latest target for attacks. Android being an open source platform and also the most widely used smartphone OS draws the attention of many malware writers to exploit the vulnerabilities of it. Attackers try to take advantage of these vulnerabilities and fool the user and misuse their data. Malwares have come a long way from simple worms to sophisticated DDOS using Botnets, the latest trends in computer malware tend to go in the distributed direction, to evade the multiple anti-virus apps developed to counter generic viruses and Trojans. However, the recent trend in android system is to have a combination of applications which acts as malware. The applications are benign individually but when grouped, these may result into a malicious activity. This paper proposes a new category of distributed malware in android system, how it can be used to evade the current security, and how it can be detected with the help of graph matching algorithm.
With growing popularity of Android, it's attack surface has also increased. Prevalence of third party android marketplaces gives attackers an opportunity to plant their malicious apps in the mobile eco-system. To evade signature based detection, attackers often transform their malware, for instance, by introducing code level changes. In this paper we propose a lightweight static Permission Flow Graph (PFG) based approach to detect malware even when they have been transformed (obfuscated). A number of techniques based on behavioral analysis have also been proposed in the past; how-ever our interest lies in leveraging the permission framework alone to detect malware variants and transformations without considering behavioral aspects of a malware. Our proposed approach constructs Permission Flow Graph (PFG) for an Android App. Transformations performed at code level, often result in changing control flow, however, most of the time, the permission flow remains invariant. As a consequences, PFGs of transformed malware and non-transformed malware remain structurally similar as shown in this paper using state-of-the-art graph similarity algorithm. Furthermore, we propose graph based similarity metrics at both edge level and vertex level in order to bring forth the structural similarity of the two PFGs being compared. We validate our proposed methodology through machine learning algorithms. Results prove that our approach is successfully able to group together Android malware and its variants (transformations) together in the same cluster. Further, we demonstrate that our proposed approach is able to detect transformed malware with a detection accuracy of 98.26%, thereby ensuring that malicious Apps can be detected even after transformations.
The state-of-the-art Android malware often encrypts or encodes malicious code snippets to evade malware detection. In this paper, such undetectable codes are called Mysterious Codes. To make such codes detectable, we design a system called Droidrevealer to automatically identify Mysterious Codes and then decode or decrypt them. The prototype of Droidrevealer is implemented and evaluated with 5,600 malwares. The results show that 257 samples contain the Mysterious Codes and 11,367 items are exposed. Furthermore, several sensitive behaviors hidden in the Mysterious Codes are disclosed by Droidrevealer.
Malware detection increasingly relies on machine learning techniques, which utilize multiple features to separate the malware from the benign apps. The effectiveness of these techniques primarily depends on the manual feature engineering process, based on human knowledge and intuition. However, given the adversaries' efforts to evade detection and the growing volume of publications on malware behaviors, the feature engineering process likely draws from a fraction of the relevant knowledge. We propose an end-to-end approach for automatic feature engineering. We describe techniques for mining documents written in natural language (e.g. scientific papers) and for representing and querying the knowledge about malware in a way that mirrors the human feature engineering process. Specifically, we first identify abstract behaviors that are associated with malware, and then we map these behaviors to concrete features that can be tested experimentally. We implement these ideas in a system called FeatureSmith, which generates a feature set for detecting Android malware. We train a classifier using these features on a large data set of benign and malicious apps. This classifier achieves a 92.5% true positive rate with only 1% false positives, which is comparable to the performance of a state-of-the-art Android malware detector that relies on manually engineered features. In addition, FeatureSmith is able to suggest informative features that are absent from the manually engineered set and to link the features generated to abstract concepts that describe malware behaviors.
The popularity and adoption of smart phones has greatly stimulated the spread of mobile malware, especially on the popular platforms such as Android. In light of their rapid growth, there is a pressing need to develop effective solutions. However, our defense capability is largely constrained by the limited understanding of these emerging mobile malware and the lack of timely access to related samples. In this paper, we focus on the Android platform and aim to systematize or characterize existing Android malware. Particularly, with more than one year effort, we have managed to collect more than 1,200 malware samples that cover the majority of existing Android malware families, ranging from their debut in August 2010 to recent ones in October 2011. In addition, we systematically characterize them from various aspects, including their installation methods, activation mechanisms as well as the nature of carried malicious payloads. The characterization and a subsequent evolution-based study of representative families reveal that they are evolving rapidly to circumvent the detection from existing mobile anti-virus software. Based on the evaluation with four representative mobile security software, our experiments show that the best case detects 79.6% of them while the worst case detects only 20.2% in our dataset. These results clearly call for the need to better develop next-generation anti-mobile-malware solutions.