Biblio
Monkey, a random testing tool from Google, has been popularly used in industrial practices for automatic test input generation for Android due to its applicability to a variety of application settings, e.g., ease of use and compatibility with different Android platforms. Recently, Monkey has been under the spotlight of the research community: recent studies found out that none of the studied tools from the academia were actually better than Monkey when applied on a set of open source Android apps. Our recent efforts performed the first case study of applying Monkey on WeChat, a popular messenger app with over 800 million monthly active users, and revealed many limitations of Monkey along with developing our improved approach to alleviate some of these limitations. In this paper, we explore two optimization techniques to improve the effectiveness and efficiency of our previous approach. We also conduct manual categorization of not-covered activities and two automatic coverage-analysis techniques to provide insightful information about the not-covered code entities. Lastly, we present findings of our empirical studies of conducting automatic random testing on WeChat with the preceding techniques.
Given the ever increasing number of research tools to automatically generate inputs to test Android applications (or simply apps), researchers recently asked the question "Are we there yet?" (in terms of the practicality of the tools). By conducting an empirical study of the various tools, the researchers found that Monkey (the most widely used tool of this category in industrial settings) outperformed all of the research tools in the study. In this paper, we present two signi cant extensions of that study. First, we conduct the rst industrial case study of applying Monkey against WeChat, a popular messenger app with over 762 million monthly active users, and report the empirical ndings on Monkey's limitations in an industrial setting. Second, we develop a new approach to address major limitations of Monkey and accomplish substantial code-coverage improvements over Monkey. We conclude the paper with empirical insights for future enhancements to both Monkey and our approach.