Visible to the public Biblio

Filters: Author is Zhang, Hongyu  [Clear All Filters]
2020-03-09
Li, Chi, Zhou, Min, Gu, Zuxing, Gu, Ming, Zhang, Hongyu.  2019.  Ares: Inferring Error Specifications through Static Analysis. 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). :1174–1177.
Misuse of APIs happens frequently due to misunderstanding of API semantics and lack of documentation. An important category of API-related defects is the error handling defects, which may result in security and reliability flaws. These defects can be detected with the help of static program analysis, provided that error specifications are known. The error specification of an API function indicates how the function can fail. Writing error specifications manually is time-consuming and tedious. Therefore, automatic inferring the error specification from API usage code is preferred. In this paper, we present Ares, a tool for automatic inferring error specifications for C code through static analysis. We employ multiple heuristics to identify error handling blocks and infer error specifications by analyzing the corresponding condition logic. Ares is evaluated on 19 real world projects, and the results reveal that Ares outperforms the state-of-the-art tool APEx by 37% in precision. Ares can also identify more error specifications than APEx. Moreover, the specifications inferred from Ares help find dozens of API-related bugs in well-known projects such as OpenSSL, among them 10 bugs are confirmed by developers. Video: https://youtu.be/nf1QnFAmu8Q. Repository: https://github.com/lc3412/Ares.
2017-08-22
Wu, Rongxin, Xiao, Xiao, Cheung, Shing-Chi, Zhang, Hongyu, Zhang, Charles.  2016.  Casper: An Efficient Approach to Call Trace Collection. Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. :678–690.

Call traces, i.e., sequences of function calls and returns, are fundamental to a wide range of program analyses such as bug reproduction, fault diagnosis, performance analysis, and many others. The conventional approach to collect call traces that instruments each function call and return site incurs large space and time overhead. Our approach aims at reducing the recording overheads by instrumenting only a small amount of call sites while keeping the capability of recovering the full trace. We propose a call trace model and a logged call trace model based on an LL(1) grammar, which enables us to define the criteria of a feasible solution to call trace collection. Based on the two models, we prove that to collect call traces with minimal instrumentation is an NP-hard problem. We then propose an efficient approach to obtaining a suboptimal solution. We implemented our approach as a tool Casper and evaluated it using the DaCapo benchmark suite. The experiment results show that our approach causes significantly lower runtime (and space) overhead than two state-of-the-arts approaches.

2017-05-18
Gu, Xiaodong, Zhang, Hongyu, Zhang, Dongmei, Kim, Sunghun.  2016.  Deep API Learning. Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. :631–642.

Developers often wonder how to implement a certain functionality (e.g., how to parse XML files) using APIs. Obtaining an API usage sequence based on an API-related natural language query is very helpful in this regard. Given a query, existing approaches utilize information retrieval models to search for matching API sequences. These approaches treat queries and APIs as bags-of-words and lack a deep understanding of the semantics of the query. We propose DeepAPI, a deep learning based approach to generate API usage sequences for a given natural language query. Instead of a bag-of-words assumption, it learns the sequence of words in a query and the sequence of associated APIs. DeepAPI adapts a neural language model named RNN Encoder-Decoder. It encodes a word sequence (user query) into a fixed-length context vector, and generates an API sequence based on the context vector. We also augment the RNN Encoder-Decoder by considering the importance of individual APIs. We empirically evaluate our approach with more than 7 million annotated code snippets collected from GitHub. The results show that our approach generates largely accurate API sequences and outperforms the related approaches.