Biblio

Filters: Author is Wang, Shuai  [Clear All Filters]
2023-06-09
Zhang, Yue, Nan, Xiaoya, Zhou, Jialing, Wang, Shuai.  2022.  Design of Differential Privacy Protection Algorithms for Cyber-Physical Systems. 2022 International Conference on Intelligent Systems and Computational Intelligence (ICISCI). :29—34.
A new privacy Laplace common recognition algorithm is designed to protect users’ privacy data in this paper. This algorithm disturbs state transitions and information generation functions using exponentially decaying Laplace noise to avoid attacks. The mean square consistency and privacy protection performance are further studied. Finally, the theoretical results obtained are verified by performing numerical simulations.
2023-04-28
Li, Zongjie, Ma, Pingchuan, Wang, Huaijin, Wang, Shuai, Tang, Qiyi, Nie, Sen, Wu, Shi.  2022.  Unleashing the Power of Compiler Intermediate Representation to Enhance Neural Program Embeddings. 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE). :2253–2265.
Neural program embeddings have demonstrated considerable promise in a range of program analysis tasks, including clone identification, program repair, code completion, and program synthesis. However, most existing methods generate neural program embeddings di-rectly from the program source codes, by learning from features such as tokens, abstract syntax trees, and control flow graphs. This paper takes a fresh look at how to improve program embed-dings by leveraging compiler intermediate representation (IR). We first demonstrate simple yet highly effective methods for enhancing embedding quality by training embedding models alongside source code and LLVM IR generated by default optimization levels (e.g., -02). We then introduce IRGEN, a framework based on genetic algorithms (GA), to identify (near-)optimal sequences of optimization flags that can significantly improve embedding quality. We use IRGEN to find optimal sequences of LLVM optimization flags by performing GA on source code datasets. We then extend a popular code embedding model, CodeCMR, by adding a new objective based on triplet loss to enable a joint learning over source code and LLVM IR. We benchmark the quality of embedding using a rep-resentative downstream application, code clone detection. When CodeCMR was trained with source code and LLVM IRs optimized by findings of IRGEN, the embedding quality was significantly im-proved, outperforming the state-of-the-art model, CodeBERT, which was trained only with source code. Our augmented CodeCMR also outperformed CodeCMR trained over source code and IR optimized with default optimization levels. We investigate the properties of optimization flags that increase embedding quality, demonstrate IRGEN's generalization in boosting other embedding models, and establish IRGEN's use in settings with extremely limited training data. Our research and findings demonstrate that a straightforward addition to modern neural code embedding models can provide a highly effective enhancement.
2022-02-24
Zhang, Maojun, Zhu, Guangxu, Wang, Shuai, Jiang, Jiamo, Zhong, Caijun, Cui, Shuguang.  2021.  Accelerating Federated Edge Learning via Optimized Probabilistic Device Scheduling. 2021 IEEE 22nd International Workshop on Signal Processing Advances in Wireless Communications (SPAWC). :606–610.
The popular federated edge learning (FEEL) framework allows privacy-preserving collaborative model training via frequent learning-updates exchange between edge devices and server. Due to the constrained bandwidth, only a subset of devices can upload their updates at each communication round. This has led to an active research area in FEEL studying the optimal device scheduling policy for minimizing communication time. However, owing to the difficulty in quantifying the exact communication time, prior work in this area can only tackle the problem partially by considering either the communication rounds or per-round latency, while the total communication time is determined by both metrics. To close this gap, we make the first attempt in this paper to formulate and solve the communication time minimization problem. We first derive a tight bound to approximate the communication time through cross-disciplinary effort involving both learning theory for convergence analysis and communication theory for per-round latency analysis. Building on the analytical result, an optimized probabilistic scheduling policy is derived in closed-form by solving the approximate communication time minimization problem. It is found that the optimized policy gradually turns its priority from suppressing the remaining communication rounds to reducing per-round latency as the training process evolves. The effectiveness of the proposed scheme is demonstrated via a use case on collaborative 3D objective detection in autonomous driving.
2019-01-16
Arrieta, Aitor, Wang, Shuai, Arruabarrena, Ainhoa, Markiegi, Urtzi, Sagardui, Goiuria, Etxeberria, Leire.  2018.  Multi-objective Black-box Test Case Selection for Cost-effectively Testing Simulation Models. Proceedings of the Genetic and Evolutionary Computation Conference. :1411–1418.
In many domains, engineers build simulation models (e.g., Simulink) before developing code to simulate the behavior of complex systems (e.g., Cyber-Physical Systems). Those models are commonly heavy to simulate which makes it difficult to execute the entire test suite. Furthermore, it is often difficult to measure white-box coverage of test cases when employing such models. In addition, the historical data related to failures might not be available. This paper proposes a cost-effective approach for test case selection that relies on black-box data related to inputs and outputs of the system. The approach defines in total five effectiveness measures and one cost measure followed by deriving in total 15 objective combinations and integrating them within Non-Dominated Sorting Genetic Algorithm-II (NSGA-II). We empirically evaluated our approach with all these 15 combinations using four case studies by employing mutation testing to assess the fault revealing capability. The results demonstrated that our approach managed to improve Random Search by 26% on average in terms of the Hypervolume quality indicator.
2018-01-23
Wang, Shuai, Wang, Wenhao, Bao, Qinkun, Wang, Pei, Wang, XiaoFeng, Wu, Dinghao.  2017.  Binary Code Retrofitting and Hardening Using SGX. Proceedings of the 2017 Workshop on Forming an Ecosystem Around Software Transformation. :43–49.

Trusted Execution Environment (TEE) is designed to deliver a safe execution environment for software systems. Intel Software Guard Extensions (SGX) provides isolated memory regions (i.e., SGX enclaves) to protect code and data from adversaries in the untrusted world. While existing research has proposed techniques to execute entire executable files inside enclave instances by providing rich sets of OS facilities, one notable limitation of these techniques is the unavoidably large size of Trusted Computing Base (TCB), which can potentially break the principle of least privilege. In this work, we describe techniques that provide practical and efficient protection of security sensitive code components in legacy binary code. Our technique dissects input binaries into multiple components which are further built into SGX enclave instances. We also leverage deliberately-designed binary editing techniques to retrofit the input binary code and preserve the original program semantics. Our tentative evaluations on hardening AES encryption and decryption procedures demonstrate the practicability and efficiency of the proposed technique.

2017-09-05
Arrieta, Aitor, Wang, Shuai, Sagardui, Goiuria, Etxeberria, Leire.  2016.  Search-based Test Case Selection of Cyber-physical System Product Lines for Simulation-based Validation. Proceedings of the 20th International Systems and Software Product Line Conference. :297–306.

Cyber-Physical Systems (CPSs) are often tested at different test levels following "X-in-the-Loop" configurations: Model-, Software- and Hardware-in-the-loop (MiL, SiL and HiL). While MiL and SiL test levels aim at testing functional requirements at the system level, the HiL test level tests functional as well as non-functional requirements by performing a real-time simulation. As testing CPS product line configurations is costly due to the fact that there are many variants to test, test cases are long, the physical layer has to be simulated and co-simulation is often necessary. It is therefore extremely important to select the appropriate test cases that cover the objectives of each level in an allowable amount of time. We propose an efficient test case selection approach adapted to the "X-in-the-Loop" test levels. Search algorithms are employed to reduce the amount of time required to test configurations of CPS product lines while achieving the test objectives of each level. We empirically evaluate three commonly-used search algorithms, i.e., Genetic Algorithm (GA), Alternating Variable Method (AVM) and Greedy (Random Search (RS) is used as a baseline) by employing two case studies with the aim of integrating the best algorithm into our approach. Results suggest that as compared with RS, our approach can reduce the costs of testing CPS product line configurations by approximately 80% while improving the overall test quality.