Visible to the public Detecting Sensitive Data Disclosure via Bi-directional Text Correlation Analysis

TitleDetecting Sensitive Data Disclosure via Bi-directional Text Correlation Analysis
Publication TypeConference Paper
Year of Publication2016
AuthorsHuang, Jianjun, Zhang, Xiangyu, Tan, Lin
Conference NameProceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4218-6
KeywordsAndroid apps, Bi-directional Text Correlation, composability, cyber physical systems, False Data Detection, Human Behavior, pubcrawl, Resiliency, Sensitive Data Disclosure
Abstract

Traditional sensitive data disclosure analysis faces two challenges: to identify sensitive data that is not generated by specific API calls, and to report the potential disclosures when the disclosed data is recognized as sensitive only after the sink operations. We address these issues by developing BidText, a novel static technique to detect sensitive data disclosures. BidText formulates the problem as a type system, in which variables are typed with the text labels that they encounter (e.g., during key-value pair operations). The type system features a novel bi-directional propagation technique that propagates the variable label sets through forward and backward data-flow. A data disclosure is reported if a parameter at a sink point is typed with a sensitive text label. BidText is evaluated on 10,000 Android apps. It reports 4,406 apps that have sensitive data disclosures, with 4,263 apps having log based disclosures and 1,688 having disclosures due to other sinks such as HTTP requests. Existing techniques can only report 64.0% of what BidText reports. And manual inspection shows that the false positive rate for BidText is 10%.

URLhttp://doi.acm.org/10.1145/2950290.2950348
DOI10.1145/2950290.2950348
Citation Keyhuang_detecting_2016