Biblio | CPS-VO

Yang, Jing, Yang, Yibiao, Sun, Maolin, Wen, Ming, Zhou, Yuming, Jin, Hai. 2022. Isolating Compiler Optimization Faults via Differentiating Finer-grained Options. 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). :481–491.

Code optimization is an essential feature for compilers and almost all software products are released by compiler optimizations. Consequently, bugs in code optimization will inevitably cast significant impact on the correctness of software systems. Locating optimization bugs in compilers is challenging as compilers typically support a large amount of optimization configurations. Although prior studies have proposed to locate compiler bugs via generating witness test programs, they are still time-consuming and not effective enough. To address such limitations, we propose an automatic bug localization approach, ODFL, for locating compiler optimization bugs via differentiating finer-grained options in this study. Specifically, we first disable the fine-grained options that are enabled by default under the bug-triggering optimization levels independently to obtain bug-free and bug-related fine-grained options. We then configure several effective passing and failing optimization sequences based on such fine-grained options to obtain multiple failing and passing compiler coverage. Finally, such generated coverage information can be utilized via Spectrum-Based Fault Localization formulae to rank the suspicious compiler files. We run ODFL on 60 buggy GCC compilers from an existing benchmark. The experimental results show that ODFL significantly outperforms the state-of-the-art compiler bug isolation approach RecBi in terms of all the evaluated metrics, demonstrating the effectiveness of ODFL. In addition, ODFL is much more efficient than RecBi as it can save more than 88% of the time for locating bugs on average.

ISSN: 1534-5351

Wu, Yueming, Zou, Deqing, Dou, Shihan, Yang, Wei, Xu, Duo, Jin, Hai. 2022. VulCNN: An Image-inspired Scalable Vulnerability Detection System. 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE). :2365—2376.

Since deep learning (DL) can automatically learn features from source code, it has been widely used to detect source code vulnerability. To achieve scalable vulnerability scanning, some prior studies intend to process the source code directly by treating them as text. To achieve accurate vulnerability detection, other approaches consider distilling the program semantics into graph representations and using them to detect vulnerability. In practice, text-based techniques are scalable but not accurate due to the lack of program semantics. Graph-based methods are accurate but not scalable since graph analysis is typically time-consuming. In this paper, we aim to achieve both scalability and accuracy on scanning large-scale source code vulnerabilities. Inspired by existing DL-based image classification which has the ability to analyze millions of images accurately, we prefer to use these techniques to accomplish our purpose. Specifically, we propose a novel idea that can efficiently convert the source code of a function into an image while preserving the program details. We implement Vul-CNN and evaluate it on a dataset of 13,687 vulnerable functions and 26,970 non-vulnerable functions. Experimental results report that VulCNN can achieve better accuracy than eight state-of-the-art vul-nerability detectors (i.e., Checkmarx, FlawFinder, RATS, TokenCNN, VulDeePecker, SySeVR, VulDeeLocator, and Devign). As for scalability, VulCNN is about four times faster than VulDeePecker and SySeVR, about 15 times faster than VulDeeLocator, and about six times faster than Devign. Furthermore, we conduct a case study on more than 25 million lines of code and the result indicates that VulCNN can detect large-scale vulnerability. Through the scanning reports, we finally discover 73 vulnerabilities that are not reported in NVD.

Ding, Xiaofeng, Zhang, Xiaodong, Bao, Zhifeng, Jin, Hai. 2018. Privacy-Preserving Triangle Counting in Large Graphs. Proceedings of the 27th ACM International Conference on Information and Knowledge Management. :1283–1292.

Triangle count is a critical parameter in mining relationships among people in social networks. However, directly publishing the findings obtained from triangle counts may bring potential privacy concern, which raises great challenges and opportunities for privacy-preserving triangle counting. In this paper, we choose to use differential privacy to protect triangle counting for large scale graphs. To reduce the large sensitivity caused in large graphs, we propose a novel graph projection method that can be used to obtain an upper bound for sensitivity in different distributions. In particular, we publish the triangle counts satisfying the node-differential privacy with two kinds of histograms: the triangle count distribution and the cumulative distribution. Moreover, we extend the research on privacy preserving triangle counting to one of its applications, the local clustering coefficient. Experimental results show that the cumulative distribution can fit the real statistical information better, and our proposed mechanism has achieved better accuracy for triangle counts while maintaining the requirement of differential privacy.

Liu, Changming, Zou, Deqing, Luo, Peng, Zhu, Bin B., Jin, Hai. 2018. A Heuristic Framework to Detect Concurrency Vulnerabilities. Proceedings of the 34th Annual Computer Security Applications Conference. :529-541.

With a growing demand of concurrent software to exploit multi-core hardware capability, concurrency vulnerabilities have become an inevitable threat to the security of today's IT industry. Existing concurrent program detection schemes focus mainly on detecting concurrency errors such as data races, atomicity violation, etc., with little attention paid to detect concurrency vulnerabilities that may be exploited to infringe security. In this paper, we propose a heuristic framework that combines both static analysis and fuzz testing to detect targeted concurrency vulnerabilities such as concurrency buffer overflow, double free, and use-after-free. The static analysis locates sensitive concurrent operations in a concurrent program, categorizes each finding into a potential type of concurrency vulnerability, and determines the execution order of the sensitive operations in each finding that would trigger the suspected concurrency vulnerability. The results are then plugged into the fuzzer with the execution order fixed by the static analysis in order to trigger the suspected concurrency vulnerabilities. In order to introduce more variance which increases possibility that the concurrency errors can be triggered, we also propose manipulation of thread scheduling priority to enable a fuzzer such as AFL to effectively explore thread interleavings in testing a concurrent program. To the best of our knowledge, this is the first fuzzer that is capable of effectively exploring concurrency errors. In evaluating the proposed heuristic framework with a benchmark suit of six real-world concurrent C programs, the framework detected two concurrency vulnerabilities for the proposed concurrency vulnerability detection, both being confirmed to be true positives, and produced three new crashes for the proposed interleaving exploring fuzzer that existing fuzzers could not produce. These results demonstrate the power and effectiveness of the proposed heuristic framework in detecting concurrency errors and vulnerabilities.

Xu, Peng, Li, Jingnan, Wang, Wei, Jin, Hai. 2016. Anonymous Identity-Based Broadcast Encryption with Constant Decryption Complexity and Strong Security. Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security. :223–233.

Anonymous Identity-Based Broadcast Encryption (AIBBE) allows a sender to broadcast a ciphertext to multi-receivers, and keeps receivers' anonymity. The existing AIBBE schemes fail to achieve efficient decryption or strong security, like the constant decryption complexity, the security under the adaptive attack, or the security in the standard model. Hence, we propose two new AIBBE schemes to overcome the drawbacks of previous schemes in the state-of-art. The biggest contribution in our work is the proposed AIBBE scheme with constant decryption complexity and the provable security under the adaptive attack in the standard model. This scheme should be the first one to obtain advantages in all above mentioned aspects, and has sufficient contribution in theory due to its strong security. We also propose another AIBBE scheme in the Random Oracle (RO) model, which is of sufficient interest in practice due to our experiment.

Xu, Peng, Xu, Jun, Wang, Wei, Jin, Hai, Susilo, Willy, Zou, Deqing. 2016. Generally Hybrid Proxy Re-Encryption: A Secure Data Sharing Among Cryptographic Clouds. Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security. :913–918.

Proxy Re-Encryption (PRE) is a favorable primitive to realize a cryptographic cloud with secure and flexible data sharing mechanism. A number of PRE schemes with versatile capabilities have been proposed for different applications. The secure data sharing can be internally achieved in each PRE scheme. But no previous work can guarantee the secure data sharing among different PRE schemes in a general manner. Moreover, it is challenging to solve this problem due to huge differences among the existing PRE schemes in their algebraic systems and public-key types. To solve this problem more generally, this paper uniforms the definitions of the existing PRE and Public Key Encryption (PKE) schemes, and further uniforms their security definitions. Then taking any uniformly defined PRE scheme and any uniformly defined PKE scheme as two building blocks, this paper constructs a Generally Hybrid Proxy Re-Encryption (GHPRE) scheme with the idea of temporary public and private keys to achieve secure data sharing between these two underlying schemes. Since PKE is a more general definition than PRE, the proposed GHPRE scheme also is workable between any two PRE schemes. Moreover, the proposed GHPRE scheme can be transparently deployed even if the underlying PRE schemes are implementing.

Xu, Peng, Li, Jingnan, Wang, Wei, Jin, Hai. 2016. Anonymous Identity-Based Broadcast Encryption with Constant Decryption Complexity and Strong Security. Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security. :223–233.

Anonymous Identity-Based Broadcast Encryption (AIBBE) allows a sender to broadcast a ciphertext to multi-receivers, and keeps receivers' anonymity. The existing AIBBE schemes fail to achieve efficient decryption or strong security, like the constant decryption complexity, the security under the adaptive attack, or the security in the standard model. Hence, we propose two new AIBBE schemes to overcome the drawbacks of previous schemes in the state-of-art. The biggest contribution in our work is the proposed AIBBE scheme with constant decryption complexity and the provable security under the adaptive attack in the standard model. This scheme should be the first one to obtain advantages in all above mentioned aspects, and has sufficient contribution in theory due to its strong security. We also propose another AIBBE scheme in the Random Oracle (RO) model, which is of sufficient interest in practice due to our experiment.