Biblio
In Android malware detection, recent work has shown that using contextual information of sensitive API invocation in the modeling of applications is able to improve the classification accuracy. However, the improvement brought by this context-awareness varies depending on how this information is used in the modeling. In this paper, we perform a comprehensive study on the effectiveness of using the contextual information in prior state-of-the-art detection systems. We find that this information has been "over-used" such that a large amount of non-essential metadata built into the models weakens the generalizability and longevity of the model, thus finally affects the detection accuracy. On the other hand, we find that the entrypoint of API invocation has the strongest impact on the classification correctness, which can further improve the accuracy if being properly captured. Based on this finding, we design and implement a lightweight, circumstance-aware detection system, named "PIKADROID" that only uses the API invocation and its entrypoint in the modeling. For extracting the meaningful entrypoints, PIKADROID applies a set of static analysis techniques to extract and sanitize the reachable entrypoints of a sensitive API, then constructs a frequency model for classification decision. In the evaluation, we show that this slim model significantly improves the detection accuracy on a data set of 23,631 applications by achieving an f-score of 97.41%, while maintaining a false positive rating of 0.96%.