Visible to the public SaTC: CORE: Small: Collaborative: Data-driven Approaches for Large-scale Security Analysis of Mobile ApplicationsConflict Detection Enabled

Project Details

Lead PI

Performance Period

Aug 15, 2017 - Jul 31, 2020

Institution(s)

University of South Florida

Award Number


This project investigates how to apply big-data analysis techniques to analyze mobile apps for the Android platform, for the purpose of accurately identifying security problems therein. A major challenge is the scale of the problem, with thousands of new apps entering the online app markets on a daily basis. Current technologies cannot keep up with the pace of the threats, and malware are regularly found in both large-scale marketplaces such as the official Google Play market and in third-party markets. The project adopts a number of advanced machine learning and data mining techniques to tackle those challenges. The large number of apps in the markets allows an automated machine learning algorithm to better capture security-related patterns and trends in the data, so that it can predict with good accuracy which apps may have security problems. Those apps are worth the more in-depth and expensive analysis that usually requires significant human effort. This creates an effective triage to deal with the scale challenge, and can be used by industry to scale the security vetting process of mobile apps. Artifacts produced from the research are released in open source and benefit practitioners. New courses on mobile apps and their security are developed. Undergraduate students are involved in this research. Underrepresented groups, including female students, also participate in the research. The materials developed from the research are used to further enrich cybersecurity education opportunities in the PIs' multiple outreach platforms in their institutions, to enable a large student body to benefit from the project.

The project designs solutions to tackle the unique challenges in applying machine learning for mobile app security analysis, most of which are due to the big data nature of the problem. A key scientific challenge faced in mobile app security analysis is the difficulty in obtaining high-quality ground truth. Many times one has to rely upon imperfect data in training and evaluation. The research experiments with a number of approaches to deal with the noise due to the imperfect labels, including semi-supervised learning algorithms, which can learn from small amounts of labeled data, or even from positive data only, together with unlabeled data. The project also explores a novel approach that uses social media information to acquire additional information to improve the ground truth and/or the prediction accuracy.