Biblio
Machine-Learning-as-a-Service has become increasingly popular, with Recommendation-as-a-Service as one of the representative examples. In such services, providing privacy protection for the users is an important topic. Reviewing privacy-preserving solutions which were proposed in the past decade, privacy and machine learning are often seen as two competing goals at stake. Though improving cryptographic primitives (e.g., secure multi-party computation (SMC) or homomorphic encryption (HE)) or devising sophisticated secure protocols has made a remarkable achievement, but in conjunction with state-of-the-art recommender systems often yields far-from-practical solutions. We tackle this problem from the direction of machine learning. We aim to design crypto-friendly recommendation algorithms, thus to obtain efficient solutions by directly using existing cryptographic tools. In particular, we propose an HE-friendly recommender system, refer to as CryptoRec, which (1) decouples user features from latent feature space, avoiding training the recommendation model on encrypted data; (2) only relies on addition and multiplication operations, making the model straightforwardly compatible with HE schemes. The properties turn recommendation-computations into a simple matrix-multiplication operation. To further improve efficiency, we introduce a sparse-quantization-reuse method which reduces the recommendation-computation time by \$9$\backslash$times\$ (compared to using CryptoRec directly), without compromising the accuracy. We demonstrate the efficiency and accuracy of CryptoRec on three real-world datasets. CryptoRec allows a server to estimate a user's preferences on thousands of items within a few seconds on a single PC, with the user's data homomorphically encrypted, while its prediction accuracy is still competitive with state-of-the-art recommender systems computing over clear data. Our solution enables Recommendation-as-a-Service on large datasets in a nearly real-time (seconds) level.
Taint analysis has been widely applied in ex post facto security applications, such as attack provenance investigation, computer forensic analysis, and reverse engineering. Unfortunately, the high runtime overhead imposed by dynamic taint analysis makes it impractical in many scenarios. The key obstacle is the strict coupling of program execution and taint tracking logic code. To alleviate this performance bottleneck, recent work seeks to offload taint analysis from program execution and run it on a spare core or a different CPU. However, since the taint analysis has heavy data and control dependencies on the program execution, the massive data in recording and transformation overshadow the benefit of decoupling. In this paper, we propose a novel technique to allow very lightweight logging, resulting in much lower execution slowdown, while still permitting us to perform full-featured offline taint analysis. We develop StraightTaint, a hybrid taint analysis tool that completely decouples the program execution and taint analysis. StraightTaint relies on very lightweight logging of the execution information to reconstruct a straight-line code, enabling an offline symbolic taint analysis without frequent data communication with the application. While StraightTaint does not log complete runtime or input values, it is able to precisely identify the causal relationships between sources and sinks, for example. Compared with traditional dynamic taint analysis tools, StraightTaint has much lower application runtime overhead.
Data intensive computing research and technology developments offer the potential of providing significant improvements in several security log management challenges. Approaches to address the complexity, timeliness, expense, diversity, and noise issues have been identified. These improvements are motivated by the increasingly important role of analytics. Machine learning and expert systems that incorporate attack patterns are providing greater detection insights. Finding actionable indicators requires the analysis to combine security event log data with other network data such and access control lists, making the big-data problem even bigger. Automation of threat intelligence is recognized as not complete with limited adoption of standards. With limited progress in anomaly signature detection, movement towards using expert systems has been identified as the path forward. Techniques focus on matching behaviors of attackers to patterns of abnormal activity in the network. The need to stream, parse, and analyze large volumes of small, semi-structured data files can be feasibly addressed through a variety of techniques identified by researchers. This report highlights research in key areas, including protection of the data, performance of the systems and network bandwidth utilization.