Biblio
Because cloud storage services have been broadly used in enterprises for online sharing and collaboration, sensitive information in images or documents may be easily leaked outside the trust enterprise on-premises due to such cloud services. Existing solutions to this problem have not fully explored the tradeoffs among application performance, service scalability, and user data privacy. Therefore, we propose CloudDLP, a generic approach for enterprises to automatically sanitize sensitive data in images and documents in browser-based cloud storage. To the best of our knowledge, CloudDLP is the first system that automatically and transparently detects and sanitizes both sensitive images and textual documents without compromising user experience or application functionality on browser-based cloud storage. To prevent sensitive information escaping from on-premises, CloudDLP utilizes deep learning methods to detect sensitive information in both images and textual documents. We have evaluated the proposed method on a number of typical cloud applications. Our experimental results show that it can achieve transparent and automatic data sanitization on the cloud storage services with relatively low overheads, while preserving most application functionalities.
An increasing number of people are using online social networking services (SNSs), and a significant amount of information related to experiences in consumption is shared in this new media form. Text mining is an emerging technique for mining useful information from the web. We aim at discovering in particular tweets semantic patterns in consumers' discussions on social media. Specifically, the purposes of this study are twofold: 1) finding similarity and dissimilarity between two sets of textual documents that include consumers' sentiment polarities, two forms of positive vs. negative opinions and 2) driving actual content from the textual data that has a semantic trend. The considered tweets include consumers' opinions on US retail companies (e.g., Amazon, Walmart). Cosine similarity and K-means clustering methods are used to achieve the former goal, and Latent Dirichlet Allocation (LDA), a popular topic modeling algorithm, is used for the latter purpose. This is the first study which discover semantic properties of textual data in consumption context beyond sentiment analysis. In addition to major findings, we apply LDA (Latent Dirichlet Allocations) to the same data and drew latent topics that represent consumers' positive opinions and negative opinions on social media.