Biblio
The purpose of the General Data Protection Regulation (GDPR) is to provide improved privacy protection. If an app controls personal data from users, it needs to be compliant with GDPR. However, GDPR lists general rules rather than exact step-by-step guidelines about how to develop an app that fulfills the requirements. Therefore, there may exist GDPR compliance violations in existing apps, which would pose severe privacy threats to app users. In this paper, we take mobile health applications (mHealth apps) as a peephole to examine the status quo of GDPR compliance in Android apps. We first propose an automated system, named HPDROID, to bridge the semantic gap between the general rules of GDPR and the app implementations by identifying the data practices declared in the app privacy policy and the data relevant behaviors in the app code. Then, based on HPDROID, we detect three kinds of GDPR compliance violations, including the incompleteness of privacy policy, the inconsistency of data collections, and the insecurity of data transmission. We perform an empirical evaluation of 796 mHealth apps. The results reveal that 189 (23.7%) of them do not provide complete privacy policies. Moreover, 59 apps collect sensitive data through different measures, but 46 (77.9%) of them contain at least one inconsistent collection behavior. Even worse, among the 59 apps, only 8 apps try to ensure the transmission security of collected data. However, all of them contain at least one encryption or SSL misuse. Our work exposes severe privacy issues to raise awareness of privacy protection for app users and developers.
Web communication has become an indispensable characteristic of mobile apps. However, it is not clear what data the apps transmit, to whom, and what consequences such transmissions have. We analyzed the web communications found in mobile apps from the perspective of security. We first manually studied 160 Android apps to identify the commonly-used communication libraries, and to understand how they are used in these apps. We then developed a tool to statically identify web API URLs used in the apps, and restore the JSON data schemas including the type and value of each parameter. We extracted 9714 distinct web API URLs that were used in 3 376 apps. We found that developers often use the java.net package for network communication, however, third-party libraries like OkHttp are also used in many apps. We discovered that insecure HTTP connections are seven times more prevalent in closed-source than in open-source apps, and that embedded SQL and JavaScript code is used in web communication in more than 500 different apps. This finding is devastating; it leaves billions of users and API service providers vulnerable to attack.
Mobile apps are widely adopted in daily life, and contain increasing security flaws. Many regulatory agencies and organizations have announced security guidelines for app development. However, most security guidelines involving technicality and compliance with this requirement is not easily feasible. Thus, we propose Mobile Apps Assessment and Analysis System (MAS), an automatic security validation system to improve guideline compliance. MAS combines static and dynamic analysis techniques, which can be used to verify whether android apps meet the security guideline requirements. We implemented MAS in practice and verified 143 real-world apps produced by the Taiwan government. Besides, we also validated 15,000 popular apps collected from Google Play Store produced in three countries. We found that most apps contain at least three security issues. Finally, we summarize the results and list the most common security flaws for consideration in further app development.
Traditional sensitive data disclosure analysis faces two challenges: to identify sensitive data that is not generated by specific API calls, and to report the potential disclosures when the disclosed data is recognized as sensitive only after the sink operations. We address these issues by developing BidText, a novel static technique to detect sensitive data disclosures. BidText formulates the problem as a type system, in which variables are typed with the text labels that they encounter (e.g., during key-value pair operations). The type system features a novel bi-directional propagation technique that propagates the variable label sets through forward and backward data-flow. A data disclosure is reported if a parameter at a sink point is typed with a sensitive text label. BidText is evaluated on 10,000 Android apps. It reports 4,406 apps that have sensitive data disclosures, with 4,263 apps having log based disclosures and 1,688 having disclosures due to other sinks such as HTTP requests. Existing techniques can only report 64.0% of what BidText reports. And manual inspection shows that the false positive rate for BidText is 10%.
The ability to identify mobile apps in network traffic has significant implications in many domains, including traffic management, malware detection, and maintaining user privacy. App identification methods in the literature typically use deep packet inspection (DPI) and analyze HTTP headers to extract app fingerprints. However, these methods cannot be used if HTTP traffic is encrypted. We investigate whether Android apps can be identified from their launch-time network traffic using only TCP/IP headers. We first capture network traffic of 86,109 app launches by repeatedly running 1,595 apps on 4 distinct Android devices. We then use supervised learning methods used previously in the web page identification literature, to identify the apps that generated the traffic. We find that: (i) popular Android apps can be identified with 88% accuracy, by using the packet sizes of the first 64 packets they generate, when the learning methods are trained and tested on the data collected from same device; (ii) when the data from an unseen device (but similar operating system/vendor) is used for testing, the apps can be identified with 67% accuracy; (iii) the app identification accuracy does not drop significantly even if the training data are stale by several days, and (iv) the accuracy does drop quite significantly if the operating system/vendor is very different. We discuss the implications of our findings as well as open issues.
The prevalence of smart devices has promoted the popularity of mobile applications (a.k.a. apps) in recent years. A number of interesting and important questions remain unanswered, such as why a user likes/dislikes an app, how an app becomes popular or eventually perishes, how a user selects apps to install and interacts with them, how frequently an app is used and how much trac it generates, etc. This paper presents an empirical analysis of app usage behaviors collected from millions of users of Wandoujia, a leading Android app marketplace in China. The dataset covers two types of user behaviors of using over 0.2 million Android apps, including (1) app management activities (i.e., installation, updating, and uninstallation) of over 0.8 million unique users and (2) app network trac from over 2 million unique users. We explore multiple aspects of such behavior data and present interesting patterns of app usage. The results provide many useful implications to the developers, users, and disseminators of mobile apps.
Although there has been much research on the leakage of sensitive data in Android applications, most of the existing research focus on how to detect the malware or adware that are intentionally collecting user privacy. There are not much research on analyzing the vulnerabilities of apps that may cause the leakage of privacy. In this paper, we present a vulnerability analyzing method which combines taint analysis and cryptography misuse detection. The four steps of this method are decompile, taint analysis, API call record, cryptography misuse analysis, all of which steps except taint analysis can be executed by the existing tools. We develop a prototype tool PW Exam to analysis how the passwords are handled and if the app is vulnerable to password leakage. Our experiment shows that a third of apps are vulnerable to leak the users' passwords.
As web-server spoofing is increasing, we investigate a novel technology termed ICmetrics, used to identify fraud for given software/hardware programs based on measurable quantities/features. ICmetrics technology is based on extracting features from digital systems' operation that may be integrated together to generate unique identifiers for each of the systems or create unique profiles that describe the systems' actual behavior. This paper looks at the properties of the several behaviors as a potential ICmetrics features to identify android apps, it presents several quality features which meet the ICmetrics requirements and can be used for encryption key generation. Finally, the paper identifies four android apps and verifies the use of ICmetrics by identifying a spoofed app as a different app altogether.