Biblio

List
Filter

Found 12044 results

Filters: Keyword is Resiliency [Clear All Filters]

2023-04-28

Zhu, Yuwen, Yu, Lei. 2022. A Modeling Method of Cyberspace Security Structure Based on Layer-Level Division. 2022 IEEE 5th International Conference on Computer and Communication Engineering Technology (CCET). :247–251.

As the cyberspace structure becomes more and more complex, the problems of dynamic network space topology, complex composition structure, large spanning space scale, and a high degree of self-organization are becoming more and more important. In this paper, we model the cyberspace elements and their dependencies by combining the knowledge of graph theory. Layer adopts a network space modeling method combining virtual and real, and level adopts a spatial iteration method. Combining the layer-level models into one, this paper proposes a fast modeling method for cyberspace security structure model with network connection relationship, hierarchical relationship, and vulnerability information as input. This method can not only clearly express the individual vulnerability constraints in the network space, but also clearly express the hierarchical relationship of the complex dependencies of network individuals. For independent network elements or independent network element groups, it has flexibility and can greatly reduce the computational complexity in later applications.

Gao, Hongbin, Wang, Shangxing, Zhang, Hongbin, Liu, Bin, Zhao, Dongmei, Liu, Zhen. 2022. Network Security Situation Assessment Method Based on Absorbing Markov Chain. 2022 International Conference on Networking and Network Applications (NaNA). :556–561.

This paper has a new network security evaluation method as an absorbing Markov chain-based assessment method. This method is different from other network security situation assessment methods based on graph theory. It effectively refinement issues such as poor objectivity of other methods, incomplete consideration of evaluation factors, and mismatching of evaluation results with the actual situation of the network. Firstly, this method collects the security elements in the network. Then, using graph theory combined with absorbing Markov chain, the threat values of vulnerable nodes are calculated and sorted. Finally, the maximum possible attack path is obtained by blending network asset information to determine the current network security status. The experimental results prove that the method fully considers the vulnerability and threat node ranking and the specific case of system network assets, which makes the evaluation result close to the actual network situation.

Hu, Yuanyuan, Cao, Xiaolong, Li, Guoqing. 2022. The Design and Realization of Information Security Technology and Computer Quality System Structure. 2022 International Conference on Artificial Intelligence in Everything (AIE). :460–464.

With the development of computer technology and information security technology, computer networks will increasingly become an important means of information exchange, permeating all areas of social life. Therefore, recognizing the vulnerabilities and potential threats of computer networks as well as various security problems that exist in reality, designing and researching computer quality architecture, and ensuring the security of network information are issues that need to be resolved urgently. The purpose of this article is to study the design and realization of information security technology and computer quality system structure. This article first summarizes the basic theory of information security technology, and then extends the core technology of information security. Combining the current status of computer quality system structure, analyzing the existing problems and deficiencies, and using information security technology to design and research the computer quality system structure on this basis. This article systematically expounds the function module data, interconnection structure and routing selection of the computer quality system structure. And use comparative method, observation method and other research methods to design and research the information security technology and computer quality system structure. Experimental research shows that when the load of the computer quality system structure studied this time is 0 or 100, the data loss rate of different lengths is 0, and the correct rate is 100, which shows extremely high feasibility.

Zhang, Zongyu, Zhou, Chengwei, Yan, Chenggang, Shi, Zhiguo. 2022. Deterministic Ziv-Zakai Bound for Compressive Time Delay Estimation. 2022 IEEE Radar Conference (RadarConf22). :1–5.

Compressive radar receiver has attracted a lot of research interest due to its capability to keep balance between sub-Nyquist sampling and high resolution. In evaluating the performance of compressive time delay estimator, Cramer-Rao bound (CRB) has been commonly utilized for lower bounding the mean square error (MSE). However, behaving as a local bound, CRB is not tight in the a priori performance region. In this paper, we introduce the Ziv-Zakai bound (ZZB) methodology into compressive sensing framework, and derive a deterministic ZZB for compressive time delay estimators as a function of the compressive sensing kernel. By effectively incorporating the a priori information of the unknown time delay, the derived ZZB performs much tighter than CRB especially in the a priori performance region. Simulation results demonstrate that the derived ZZB outperforms the Bayesian CRB over a wide range of signal-to-noise ratio, where different types of a priori distribution of time delay are considered.

Nicholls, D., Robinson, A., Wells, J., Moshtaghpour, A., Bahri, M., Kirkland, A., Browning, N.. 2022. Compressive Scanning Transmission Electron Microscopy. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). :1586–1590.

Scanning Transmission Electron Microscopy (STEM) offers high-resolution images that are used to quantify the nanoscale atomic structure and composition of materials and biological specimens. In many cases, however, the resolution is limited by the electron beam damage, since in traditional STEM, a focused electron beam scans every location of the sample in a raster fashion. In this paper, we propose a scanning method based on the theory of Compressive Sensing (CS) and subsampling the electron probe locations using a line hop sampling scheme that significantly reduces the electron beam damage. We experimentally validate the feasibility of the proposed method by acquiring real CS-STEM data, and recovering images using a Bayesian dictionary learning approach. We support the proposed method by applying a series of masks to fully-sampled STEM data to simulate the expectation of real CS-STEM. Finally, we perform the real data experimental series using a constrained-dose budget to limit the impact of electron dose upon the results, by ensuring that the total electron count remains constant for each image.

ISSN: 2379-190X

Huang, Wenwei, Cao, Chunhong, Hong, Sixia, Gao, Xieping. 2022. ISTA-based Adaptive Sparse Sampling Network for Compressive Sensing MRI Reconstruction. 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). :999–1004.

The compressed sensing (CS) method can reconstruct images with a small amount of under-sampling data, which is an effective method for fast magnetic resonance imaging (MRI). As the traditional optimization-based models for MRI suffered from non-adaptive sampling and shallow” representation ability, they were unable to characterize the rich patterns in MRI data. In this paper, we propose a CS MRI method based on iterative shrinkage threshold algorithm (ISTA) and adaptive sparse sampling, called DSLS-ISTA-Net. Corresponding to the sampling and reconstruction of the CS method, the network framework includes two folders: the sampling sub-network and the improved ISTA reconstruction sub-network which are coordinated with each other through end-to-end training in an unsupervised way. The sampling sub-network and ISTA reconstruction sub-network are responsible for the implementation of adaptive sparse sampling and deep sparse representation respectively. In the testing phase, we investigate different modules and parameters in the network structure, and perform extensive experiments on MR images at different sampling rates to obtain the optimal network. Due to the combination of the advantages of the model-based method and the deep learning-based method in this method, and taking both adaptive sampling and deep sparse representation into account, the proposed networks significantly improve the reconstruction performance compared to the art-of-state CS-MRI approaches.

Pham, Quang Duc, Hayasaki, Yoshio. 2022. Time of flight three-dimensional imaging camera using compressive sampling technique with sparse frequency intensity modulation light source. 2022 IEEE CPMT Symposium Japan (ICSJ). :168–171.

The camera constructed by a megahertz range intensity modulation active light source and a kilo-frame rate range fast camera based on compressive sensing (CS) technique for three-dimensional (3D) image acquisition was proposed in this research.

ISSN: 2475-8418

Barac, Petar, Bajor, Matthew, Kinget, Peter R.. 2022. Compressive-Sampling Spectrum Scanning with a Beamforming Receiver for Rapid, Directional, Wideband Signal Detection. 2022 IEEE 95th Vehicular Technology Conference: (VTC2022-Spring). :1–5.

Communication systems across a variety of applications are increasingly using the angular domain to improve spectrum management. They require new sensing architectures to perform energy-efficient measurements of the electromagnetic environment that can be deployed in a variety of use cases. This paper presents the Directional Spectrum Sensor (DSS), a compressive sampling (CS) based analog-to-information converter (CS-AIC) that performs spectrum scanning in a focused beam. The DSS offers increased spectrum sensing sensitivity and interferer tolerance compared to omnidirectional sensors. The DSS implementation uses a multi-antenna beamforming architecture with local oscillators that are modulated with pseudo random waveforms to obtain CS measurements. The overall operation, limitations, and the influence of wideband angular effects on the spectrum scanning performance are discussed. Measurements on an experimental prototype are presented and highlight improvements over single antenna, omnidirectional sensing systems.

ISSN: 2577-2465

Liu, Cen, Luo, Laiwei, Wang, Jun, Zhang, Chao, Pan, Changyong. 2022. A New Digital Predistortion Based On B spline Function With Compressive Sampling Pruning. 2022 International Wireless Communications and Mobile Computing (IWCMC). :1200–1205.

A power amplifier(PA) is inherently nonlinear device and is used in a communication system widely. Due to the nonlinearity of PA, the communication system is hard to work well. Digital predistortion (DPD) is the way to solve this problem. Using Volterra function to fit the PA is what most DPD solutions do. However, when it comes to wideband signal, there is a deduction on the performance of the Volterra function. In this paper, we replace the Volterra function with B-spline function which performs better on fitting PA at wideband signal. And the other benefit is that the orthogonality of coding matrix A could be improved, enhancing the stability of computation. Additionally, we use compressive sampling to reduce the complexity of the function model.

ISSN: 2376-6506

Lotfollahi, Mahsa, Tran, Nguyen, Gajjela, Chalapathi, Berisha, Sebastian, Han, Zhu, Mayerich, David, Reddy, Rohith. 2022. Adaptive Compressive Sampling for Mid-Infrared Spectroscopic Imaging. 2022 IEEE International Conference on Image Processing (ICIP). :2336–2340.

Mid-infrared spectroscopic imaging (MIRSI) is an emerging class of label-free, biochemically quantitative technologies targeting digital histopathology. Conventional histopathology relies on chemical stains that alter tissue color. This approach is qualitative, often making histopathologic examination subjective and difficult to quantify. MIRSI addresses these challenges through quantitative and repeatable imaging that leverages native molecular contrast. Fourier transform infrared (FTIR) imaging, the best-known MIRSI technology, has two challenges that have hindered its widespread adoption: data collection speed and spatial resolution. Recent technological breakthroughs, such as photothermal MIRSI, provide an order of magnitude improvement in spatial resolution. However, this comes at the cost of acquisition speed, which is impractical for clinical tissue samples. This paper introduces an adaptive compressive sampling technique to reduce hyperspectral data acquisition time by an order of magnitude by leveraging spectral and spatial sparsity. This method identifies the most informative spatial and spectral features, integrates a fast tensor completion algorithm to reconstruct megapixel-scale images, and demonstrates speed advantages over FTIR imaging while providing spatial resolutions comparable to new photothermal approaches.

ISSN: 2381-8549

Nema, Tesu, Parsai, M. P.. 2022. Reconstruction of Incomplete Image by Radial Sampling. 2022 International Conference on Computer Communication and Informatics (ICCCI). :1–4.

Signals get sampled using Nyquist rate in conventional sampling method, but in compressive sensing the signals sampled below Nyquist rate by randomly taking the signal projections and reconstructing it out of very few estimations. But in case of recovering the image by utilizing compressive measurements with the help of multi-resolution grid where the image has certain region of interest (RoI) that is more important than the rest, it is not efficient. The conventional Cartesian sampling cannot give good result in motion image sensing recovery and is limited to stationary image sensing process. The proposed work gives improved results by using Radial sampling (a type of compression sensing). This paper discusses the approach of Radial sampling along with the application of Sparse Fourier Transform algorithms that helps in reducing acquisition cost and input/output overhead.

ISSN: 2329-7190

Mahind, Umesh, Karia, Deepak. 2022. Development and Analysis of Sparse Spasmodic Sampling Techniques. 2022 International Conference on Edge Computing and Applications (ICECAA). :818–823.

The Compressive Sensing (CS) has wide range of applications in various domains. The sampling of sparse signal, which is periodic or aperiodic in nature, is still an out of focus topic. This paper proposes novel Sparse Spasmodic Sampling (SSS) techniques for different sparse signal in original domain. The SSS techniques are proposed to overcome the drawback of the existing CS sampling techniques, which can sample any sparse signal efficiently and also find location of non-zero components in signals. First, Sparse Spasmodic Sampling model-1 (SSS-1) which samples random points and also include non-zero components is proposed. Another sampling technique, Sparse Spasmodic Sampling model-2 (SSS-2) has the same working principle as model-1 with some advancements in design. It samples equi-distance points unlike SSS-1. It is demonstrated that, using any sampling technique, the signal is able to reconstruct with a reconstruction algorithm with a smaller number of measurements. Simulation results are provided to demonstrate the effectiveness of the proposed sampling techniques.

'Ammar, Muhammad Amirul, Purnamasari, Rita, Budiman, Gelar. 2022. Compressive Sampling on Weather Radar Application via Discrete Cosine Transform (DCT). 2022 IEEE Symposium on Future Telecommunication Technologies (SOFTT). :83–89.

A weather radar is expected to provide information about weather conditions in real time and valid. To obtain these results, weather radar takes a lot of data samples, so a large amount of data is obtained. Therefore, the weather radar equipment must provide bandwidth for a large capacity for transmission and storage media. To reduce the burden of data volume by performing compression techniques at the time of data acquisition. Compressive Sampling (CS) is a new data acquisition method that allows the sampling and compression processes to be carried out simultaneously to speed up computing time, reduce bandwidth when passed on transmission media, and save storage media. There are three stages in the CS method, namely: sparsity transformation using the Discrete Cosine Transform (DCT) algorithm, sampling using a measurement matrix, and reconstruction using the Orthogonal Matching Pursuit (OMP) algorithm. The sparsity transformation aims to convert the representation of the radar signal into a sparse form. Sampling is used to extract important information from the radar signal, and reconstruction is used to get the radar signal back. The data used in this study is the real data of the IDRA beat signal. Based on the CS simulation that has been done, the best PSNR and RMSE values are obtained when using a CR value of two times, while the shortest computation time is obtained when using a CR value of 32 times. CS simulation in a sector via DCT using the CR value two times produces a PSNR value of 20.838 dB and an RMSE value of 0.091. CS simulation in a sector via DCT using the CR value 32 times requires a computation time of 10.574 seconds.

López, Hiram H., Matthews, Gretchen L., Valvo, Daniel. 2022. Secure MatDot codes: a secure, distributed matrix multiplication scheme. 2022 IEEE Information Theory Workshop (ITW). :149–154.

This paper presents secure MatDot codes, a family of evaluation codes that support secure distributed matrix multiplication via a careful selection of evaluation points that exploit the properties of the dual code. We show that the secure MatDot codes provide security against the user by using locally recoverable codes. These new codes complement the recently studied discrete Fourier transform codes for distributed matrix multiplication schemes that also provide security against the user. There are scenarios where the associated costs are the same for both families and instances where the secure MatDot codes offer a lower cost. In addition, the secure MatDot code provides an alternative way to handle the matrix multiplication by identifying the fastest servers in advance. In this way, it can determine a product using fewer servers, specified in advance, than the MatDot codes which achieve the optimal recovery threshold for distributed matrix multiplication schemes.

Zhang, Xin, Sun, Hongyu, He, Zhipeng, Gu, MianXue, Feng, Jingyu, Zhang, Yuqing. 2022. VDBWGDL: Vulnerability Detection Based On Weight Graph And Deep Learning. 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). :186–190.

Vulnerability detection has always been an essential part of maintaining information security, and the existing work can significantly improve the performance of vulnerability detection. However, due to the differences in representation forms and deep learning models, various methods still have some limitations. In order to overcome this defect, We propose a vulnerability detection method VDBWGDL, based on weight graphs and deep learning. Firstly, it accurately locates vulnerability-sensitive keywords and generates variant codes that satisfy vulnerability trigger logic and programmer programming style through code variant methods. Then, the control flow graph is sliced for vulnerable code keywords and program critical statements. The code block is converted into a vector containing rich semantic information and input into the weight map through the deep learning model. According to specific rules, different weights are set for each node. Finally, the similarity is obtained through the similarity comparison algorithm, and the suspected vulnerability is output according to different thresholds. VDBWGDL improves the accuracy and F1 value by 3.98% and 4.85% compared with four state-of-the-art models. The experimental results prove the effectiveness of VDBWGDL.

ISSN: 2325-6664

Wang, Yiwen, Liang, Jifan, Ma, Xiao. 2022. Local Constraint-Based Ordered Statistics Decoding for Short Block Codes. 2022 IEEE Information Theory Workshop (ITW). :107–112.

In this paper, we propose a new ordered statistics decoding (OSD) for linear block codes, which is referred to as local constraint-based OSD (LC-OSD). Distinguished from the conventional OSD, which chooses the most reliable basis (MRB) for re-encoding, the LC-OSD chooses an extended MRB on which local constraints are naturally imposed. A list of candidate codewords is then generated by performing a serial list Viterbi algorithm (SLVA) over the trellis specified with the local constraints. To terminate early the SLVA for complexity reduction, we present a simple criterion which monitors the ratio of the bound on the likelihood of the unexplored candidate codewords to the sum of the hard-decision vector’s likelihood and the up-to-date optimal candidate’s likelihood. Simulation results show that the LC-OSD can have a much less number of test patterns than that of the conventional OSD but cause negligible performance loss. Comparisons with other complexity-reduced OSDs are also conducted, showing the advantages of the LC-OSD in terms of complexity.

Jiang, Zhenghong. 2022. Source Code Vulnerability Mining Method based on Graph Neural Network. 2022 IEEE 2nd International Conference on Electronic Technology, Communication and Information (ICETCI). :1177–1180.

Vulnerability discovery is an important field of computer security research and development today. Because most of the current vulnerability discovery methods require large-scale manual auditing, and the code parsing process is cumbersome and time-consuming, the vulnerability discovery effect is reduced. Therefore, for the uncertainty of vulnerability discovery itself, it is the most basic tool design principle that auxiliary security analysts cannot completely replace them. The purpose of this paper is to study the source code vulnerability discovery method based on graph neural network. This paper analyzes the three processes of data preparation, source code vulnerability mining and security assurance of the source code vulnerability mining method, and also analyzes the suspiciousness and particularity of the experimental results. The empirical analysis results show that the types of traditional source code vulnerability mining methods become more concise and convenient after using graph neural network technology, and we conducted a survey and found that more than 82% of people felt that the design source code vulnerability mining method used When it comes to graph neural networks, it is found that the design efficiency has become higher.

Aladi, Ahmed, Alsusa, Emad. 2022. A Secure Turbo Codes Design on Physical Layer Security Based on Interleaving and Puncturing. 2022 IEEE 96th Vehicular Technology Conference (VTC2022-Fall). :1–7.

Nowadays, improving the reliability and security of the transmitted data has gained more attention with the increase in emerging power-limited and lightweight communication devices. Also, the transmission needs to meet specific latency requirements. Combining data encryption and encoding in one physical layer block has been exploited to study the effect on security and latency over traditional sequential data transmission. Some of the current works target secure error-correcting codes that may be candidates for post-quantum computing. However, modifying the popularly used channel coding techniques to guarantee secrecy and maintain the same error performance and complexity at the decoder is challenging since the structure of the channel coding blocks is altered which results in less optimal decoding performance. Also, the redundancy nature of the error-correcting codes complicates the encryption method. In this paper, we briefly review the proposed security schemes on Turbo codes. Then, we propose a secure turbo code design and compare it with the relevant security schemes in the literature. We show that the proposed method is more secure without adding complexity.

ISSN: 2577-2465

Zhu, Tingting, Liang, Jifan, Ma, Xiao. 2022. Ternary Convolutional LDGM Codes with Applications to Gaussian Source Compression. 2022 IEEE International Symposium on Information Theory (ISIT). :73–78.

We present a ternary source coding scheme in this paper, which is a special class of low density generator matrix (LDGM) codes. We prove that a ternary linear block LDGM code, whose generator matrix is randomly generated with each element independent and identically distributed, is universal for source coding in terms of the symbol-error rate (SER). To circumvent the high-complex maximum likelihood decoding, we introduce a special class of convolutional LDGM codes, called block Markov superposition transmission of repetition (BMST-R) codes, which are iteratively decodable by a sliding window algorithm. Then the presented BMST-R codes are applied to construct a tandem scheme for Gaussian source compression, where a dead-zone quantizer is introduced before the ternary source coding. The main advantages of this scheme are its universality and flexibility. The dead-zone quantizer can choose a proper quantization level according to the distortion requirement, while the LDGM codes can adapt the code rate to approach the entropy of the quantized sequence. Numerical results show that the proposed scheme performs well for ternary sources over a wide range of code rates and that the distortion introduced by quantization dominates provided that the code rate is slightly greater than the discrete entropy.

ISSN: 2157-8117

Yang, Hongna, Zhang, Yiwei. 2022. On an extremal problem of regular graphs related to fractional repetition codes. 2022 IEEE International Symposium on Information Theory (ISIT). :1566–1571.

Fractional repetition (FR) codes are a special family of regenerating codes with the repair-by-transfer property. The constructions of FR codes are naturally related to combinatorial designs, graphs, and hypergraphs. Given the file size of an FR code, it is desirable to determine the minimum number of storage nodes needed. The problem is related to an extremal graph theory problem, which asks for the minimum number of vertices of an α-regular graph such that any subgraph with k vertices has at most δ edges. In this paper, we present a class of regular graphs for this problem to give the bounds for the minimum number of storage nodes for the FR codes.

ISSN: 2157-8117

Tang, Shibo, Wang, Xingxin, Gao, Yifei, Hu, Wei. 2022. Accelerating SoC Security Verification and Vulnerability Detection Through Symbolic Execution. 2022 19th International SoC Design Conference (ISOCC). :207–208.

Model checking is one of the most commonly used technique in formal verification. However, the exponential scale state space renders exhaustive state enumeration inefficient even for a moderate System on Chip (SoC) design. In this paper, we propose a method that leverages symbolic execution to accelerate state space search and pinpoint security vulnerabilities. We automatically convert the hardware design to functionally equivalent C++ code and utilize the KLEE symbolic execution engine to perform state exploration through heuristic search. To reduce the search space, we symbolically represent essential input signals while making non-critical inputs concrete. Experiment results have demonstrated that our method can precisely identify security vulnerabilities at significantly lower computation cost.

Abraham, Jacob, Ehret, Alan, Kinsy, Michel A.. 2022. A Compiler for Transparent Namespace-Based Access Control for the Zeno Architecture. 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED). :1–10.

With memory safety and security issues continuing to plague modern systems, security is rapidly becoming a first class priority in new architectures and competes directly with performance and power efficiency. The capability-based architecture model provides a promising solution to many memory vulnerabilities by replacing plain addresses with capabilities, i.e., addresses and related metadata. A key advantage of the capability model is compatibility with existing code bases. Capabilities can be implemented transparently to a programmer, i.e., without source code changes. Capabilities leverage semantics in source code to describe access permissions but require customized compilers to translate the semantics to their binary equivalent.In this work, we introduce a complete capabilityaware compiler toolchain for such secure architectures. We illustrate the compiler construction with a RISC-V capability-based architecture, called Zeno. As a securityfocused, large-scale, global shared memory architecture, Zeno implements a Namespace-based capability model for accesses. Namespace IDs (NSID) are encoded with an extended addressing model to associate them with access permission metadata elsewhere in the system. The NSID extended addressing model requires custom compiler support to fully leverage the protections offered by Namespaces. The Zeno compiler produces code transparently to the programmer that is aware of Namespaces and maintains their integrity. The Zeno assembler enables custom Zeno instructions which support secure memory operations. Our results show that our custom toolchain moderately increases the binary size compared to nonZeno compilation. We find the minimal overhead incurred by the additional NSID management instructions to be an acceptable trade-off for the memory safety and security offered by Zeno Namespaces.

Moses, William S., Narayanan, Sri Hari Krishna, Paehler, Ludger, Churavy, Valentin, Schanen, Michel, Hückelheim, Jan, Doerfert, Johannes, Hovland, Paul. 2022. Scalable Automatic Differentiation of Multiple Parallel Paradigms through Compiler Augmentation. SC22: International Conference for High Performance Computing, Networking, Storage and Analysis. :1–18.

Derivatives are key to numerous science, engineering, and machine learning applications. While existing tools generate derivatives of programs in a single language, modern parallel applications combine a set of frameworks and languages to leverage available performance and function in an evolving hardware landscape. We propose a scheme for differentiating arbitrary DAG-based parallelism that preserves scalability and efficiency, implemented into the LLVM-based Enzyme automatic differentiation framework. By integrating with a full-fledged compiler backend, Enzyme can differentiate numerous parallel frameworks and directly control code generation. Combined with its ability to differentiate any LLVM-based language, this flexibility permits Enzyme to leverage the compiler tool chain for parallel and differentiation-specitic optimizations. We differentiate nine distinct versions of the LULESH and miniBUDE applications, written in different programming languages (C++, Julia) and parallel frameworks (OpenMP, MPI, RAJA, Julia tasks, MPI.jl), demonstrating similar scalability to the original program. On benchmarks with 64 threads or nodes, we find a differentiation overhead of 3.4–6.8× on C++ and 5.4–12.5× on Julia.

Li, Zongjie, Ma, Pingchuan, Wang, Huaijin, Wang, Shuai, Tang, Qiyi, Nie, Sen, Wu, Shi. 2022. Unleashing the Power of Compiler Intermediate Representation to Enhance Neural Program Embeddings. 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE). :2253–2265.

Neural program embeddings have demonstrated considerable promise in a range of program analysis tasks, including clone identification, program repair, code completion, and program synthesis. However, most existing methods generate neural program embeddings di-rectly from the program source codes, by learning from features such as tokens, abstract syntax trees, and control flow graphs. This paper takes a fresh look at how to improve program embed-dings by leveraging compiler intermediate representation (IR). We first demonstrate simple yet highly effective methods for enhancing embedding quality by training embedding models alongside source code and LLVM IR generated by default optimization levels (e.g., -02). We then introduce IRGEN, a framework based on genetic algorithms (GA), to identify (near-)optimal sequences of optimization flags that can significantly improve embedding quality. We use IRGEN to find optimal sequences of LLVM optimization flags by performing GA on source code datasets. We then extend a popular code embedding model, CodeCMR, by adding a new objective based on triplet loss to enable a joint learning over source code and LLVM IR. We benchmark the quality of embedding using a rep-resentative downstream application, code clone detection. When CodeCMR was trained with source code and LLVM IRs optimized by findings of IRGEN, the embedding quality was significantly im-proved, outperforming the state-of-the-art model, CodeBERT, which was trained only with source code. Our augmented CodeCMR also outperformed CodeCMR trained over source code and IR optimized with default optimization levels. We investigate the properties of optimization flags that increase embedding quality, demonstrate IRGEN's generalization in boosting other embedding models, and establish IRGEN's use in settings with extremely limited training data. Our research and findings demonstrate that a straightforward addition to modern neural code embedding models can provide a highly effective enhancement.

Chen, Ligeng, He, Zhongling, Wu, Hao, Xu, Fengyuan, Qian, Yi, Mao, Bing. 2022. DIComP: Lightweight Data-Driven Inference of Binary Compiler Provenance with High Accuracy. 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). :112–122.

Binary analysis is pervasively utilized to assess software security and test vulnerabilities without accessing source codes. The analysis validity is heavily influenced by the inferring ability of information related to the code compilation. Among the compilation information, compiler type and optimization level, as the key factors determining how binaries look like, are still difficult to be inferred efficiently with existing tools. In this paper, we conduct a thorough empirical study on the binary's appearance under various compilation settings and propose a lightweight binary analysis tool based on the simplest machine learning method, called DIComP to infer the compiler and optimization level via most relevant features according to the observation. Our comprehensive evaluations demonstrate that DIComP can fully recognize the compiler provenance, and it is effective in inferring the optimization levels with up to 90% accuracy. Also, it is efficient to infer thousands of binaries at a millisecond level with our lightweight machine learning model (1MB).