Biblio

Found 218 results

Filters: Keyword is static analysis  [Clear All Filters]
2021-06-24
Saletta, Martina, Ferretti, Claudio.  2020.  A Neural Embedding for Source Code: Security Analysis and CWE Lists. 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). :523—530.
In this paper, we design a technique for mapping the source code into a vector space and we show its application in the recognition of security weaknesses. By applying ideas commonly used in Natural Language Processing, we train a model for producing an embedding of programs starting from their Abstract Syntax Trees. We then show how such embedding is able to infer clusters roughly separating different classes of software weaknesses. Even if the training of the embedding is unsupervised and made on a generic Java dataset, we show that the model can be used for supervised learning of specific classes of vulnerabilities, helping to capture some features distinguishing them in code. Finally, we discuss how our model performs over the different types of vulnerabilities categorized by the CWE initiative.
2021-09-21
Lee, Yen-Ting, Ban, Tao, Wan, Tzu-Ling, Cheng, Shin-Ming, Isawa, Ryoichi, Takahashi, Takeshi, Inoue, Daisuke.  2020.  Cross Platform IoT-Malware Family Classification Based on Printable Strings. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :775–784.
In this era of rapid network development, Internet of Things (IoT) security considerations receive a lot of attention from both the research and commercial sectors. With limited computation resource, unfriendly interface, and poor software implementation, legacy IoT devices are vulnerable to many infamous mal ware attacks. Moreover, the heterogeneity of IoT platforms and the diversity of IoT malware make the detection and classification of IoT malware even more challenging. In this paper, we propose to use printable strings as an easy-to-get but effective cross-platform feature to identify IoT malware on different IoT platforms. The discriminating capability of these strings are verified using a set of machine learning algorithms on malware family classification across different platforms. The proposed scheme shows a 99% accuracy on a large scale IoT malware dataset consisted of 120K executable fils in executable and linkable format when the training and test are done on the same platform. Meanwhile, it also achieves a 96% accuracy when training is carried out on a few popular IoT platforms but test is done on different platforms. Efficient malware prevention and mitigation solutions can be enabled based on the proposed method to prevent and mitigate IoT malware damages across different platforms.
Sartoli, Sara, Wei, Yong, Hampton, Shane.  2020.  Malware Classification Using Recurrence Plots and Deep Neural Network. 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA). :901–906.
In this paper, we introduce a method for visualizing and classifying malware binaries. A malware binary consists of a series of data points of compiled machine codes that represent programming components. The occurrence and recurrence behavior of these components is determined by the common tasks malware samples in a particular family carry out. Thus, we view a malware binary as a series of emissions generated by an underlying stochastic process and use recurrence plots to transform malware binaries into two-dimensional texture images. We observe that recurrence plot-based malware images have significant visual similarities within the same family and are different from samples in other families. We apply deep CNN classifiers to classify malware samples. The proposed approach does not require creating malware signature or manual feature engineering. Our preliminary experimental results show that the proposed malware representation leads to a higher and more stable accuracy in comparison to directly transforming malware binaries to gray-scale images.
2021-10-04
Lu, Shuaibing, Kuang, Xiaohui, Nie, Yuanping, Lin, Zhechao.  2020.  A Hybrid Interface Recovery Method for Android Kernels Fuzzing. 2020 IEEE 20th International Conference on Software Quality, Reliability and Security (QRS). :335–346.
Android kernel fuzzing is a research area of interest specifically for detecting kernel vulnerabilities which may allow attackers to obtain the root privilege. The number of Android mobile phones is increasing rapidly with the explosive growth of Android kernel drivers. Interface aware fuzzing is an effective technique to test the security of kernel driver. Existing researches rely on static analysis with kernel source code. However, in fact, there exist millions of Android mobile phones without public accessible source code. In this paper, we propose a hybrid interface recovery method for fuzzing kernels which can recover kernel driver interface no matter the source code is available or not. In white box condition, we employ a dynamic interface recover method that can automatically and completely identify the interface knowledge. In black box condition, we use reverse engineering to extract the key interface information and use similarity computation to infer argument types. We evaluate our hybrid algorithm on on 12 Android smartphones from 9 vendors. Empirical experimental results show that our method can effectively recover interface argument lists and find Android kernel bugs. In total, 31 vulnerabilities are reported in white and black box conditions. The vulnerabilities were responsibly disclosed to affected vendors and 9 of the reported vulnerabilities have been already assigned CVEs.
2020-12-17
Sun, P., Garcia, L., Salles-Loustau, G., Zonouz, S..  2020.  Hybrid Firmware Analysis for Known Mobile and IoT Security Vulnerabilities. 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :373—384.

Mobile and IoT operating systems–and their ensuing software updates–are usually distributed as binary files. Given that these binary files are commonly closed source, users or businesses who want to assess the security of the software need to rely on reverse engineering. Further, verifying the correct application of the latest software patches in a given binary is an open problem. The regular application of software patches is a central pillar for improving mobile and IoT device security. This requires developers, integrators, and vendors to propagate patches to all affected devices in a timely and coordinated fashion. In practice, vendors follow different and sometimes improper security update agendas for both mobile and IoT products. Moreover, previous studies revealed the existence of a hidden patch gap: several vendors falsely reported that they patched vulnerabilities. Therefore, techniques to verify whether vulnerabilities have been patched or not in a given binary are essential. Deep learning approaches have shown to be promising for static binary analyses with respect to inferring binary similarity as well as vulnerability detection. However, these approaches fail to capture the dynamic behavior of these systems, and, as a result, they may inundate the analysis with false positives when performing vulnerability discovery in the wild. In particular, they cannot capture the fine-grained characteristics necessary to distinguish whether a vulnerability has been patched or not. In this paper, we present PATCHECKO, a vulnerability and patch presence detection framework for executable binaries. PATCHECKO relies on a hybrid, cross-platform binary code similarity analysis that combines deep learning-based static binary analysis with dynamic binary analysis. PATCHECKO does not require access to the source code of the target binary nor that of vulnerable functions. We evaluate PATCHECKO on the most recent Google Pixel 2 smartphone and the Android Things IoT firmware images, within which 25 known CVE vulnerabilities have been previously reported and patched. Our deep learning model shows a vulnerability detection accuracy of over 93%. We further prune the candidates found by the deep learning stage–which includes false positives–via dynamic binary analysis. Consequently, PATCHECKO successfully identifies the correct matches among the candidate functions in the top 3 ranked outcomes 100% of the time. Furthermore, PATCHECKO's differential engine distinguishes between functions that are still vulnerable and those that are patched with an accuracy of 96%.

2022-08-12
Andes, Neil, Wei, Mingkui.  2020.  District Ransomware: Static and Dynamic Analysis. 2020 8th International Symposium on Digital Forensics and Security (ISDFS). :1–6.
Ransomware is one of the fastest growing threats to internet security. New Ransomware attacks happen around the globe, on a weekly basis. These attacks happen to individual users and groups, from almost any type of business. Many of these attacks involve Ransomware as a service, where one attacker creates a template Malware, which can be purchased and modified by other attackers to perform specific actions. The District Ransomware was a less well-known strain. This work focuses on statically and dynamically analyzing the District Ransomware and presenting the results.
Medeiros, Ibéria, Neves, Nuno.  2020.  Impact of Coding Styles on Behaviours of Static Analysis Tools for Web Applications. 2020 50th Annual IEEE-IFIP International Conference on Dependable Systems and Networks-Supplemental Volume (DSN-S). :55–56.

Web applications have become an essential resource to access the services of diverse subjects (e.g., financial, healthcare) available on the Internet. Despite the efforts that have been made on its security, namely on the investigation of better techniques to detect vulnerabilities on its source code, the number of vulnerabilities exploited has not decreased. Static analysis tools (SATs) are often used to test the security of applications since their outcomes can help developers in the correction of the bugs they found. The conducted investigation made over SATs stated they often generate errors (false positives (FP) and false negatives (FN)), whose cause is recurrently associated with very diverse coding styles, i.e., similar functionality is implemented in distinct manners, and programming practices that create ambiguity, such as the reuse and share of variables. Based on a common practice of using multiple forms in a same webpage and its processing in a single file, we defined a use case for user login and register with six coding styles scenarios for processing their data, and evaluated the behaviour of three SATs (phpSAFE, RIPS and WAP) with them to verify and understand why SATs produce FP and FN.

2022-10-16
Shekarisaz, Mohsen, Talebian, Fatemeh, Jabariani, Marjan, Mehri, Farzad, Faghih, Fathiyeh, Kargahi, Mehdi.  2020.  Program Energy-Hotspot Detection and Removal: A Static Analysis Approach. 2020 CSI/CPSSI International Symposium on Real-Time and Embedded Systems and Technologies (RTEST). :1–8.
The major energy-hungry components in today's battery-operated embedded devices are mostly peripheral modules like LTE, WiFi, GPS, etc. Inefficient use of these modules causes energy hotspots, namely segments of the embedded software in which the module wastes energy. We study two such hotspots in the current paper, and provide the corresponding detection and removal algorithms based on static analysis techniques. The program code hotspots occur due to unnecessary releasing and re-acquiring of a module (which puts the module in power saving mode for a while) and misplaced acquiring of the module (which makes the module or processor to waste energy in idle mode). The detections are performed according to some relation between extreme (worst-case/best-case) execution times of some program segments and time/energy specifications of the module. The experimental results on our benchmarks show about 28 percent of energy reduction after the hotspot removals.
2021-03-15
Staicu, C.-A., Torp, M. T., Schäfer, M., Møller, A., Pradel, M..  2020.  Extracting Taint Specifications for JavaScript Libraries. 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). :198—209.

Modern JavaScript applications extensively depend on third-party libraries. Especially for the Node.js platform, vulnerabilities can have severe consequences to the security of applications, resulting in, e.g., cross-site scripting and command injection attacks. Existing static analysis tools that have been developed to automatically detect such issues are either too coarse-grained, looking only at package dependency structure while ignoring dataflow, or rely on manually written taint specifications for the most popular libraries to ensure analysis scalability. In this work, we propose a technique for automatically extracting taint specifications for JavaScript libraries, based on a dynamic analysis that leverages the existing test suites of the libraries and their available clients in the npm repository. Due to the dynamic nature of JavaScript, mapping observations from dynamic analysis to taint specifications that fit into a static analysis is non-trivial. Our main insight is that this challenge can be addressed by a combination of an access path mechanism that identifies entry and exit points, and the use of membranes around the libraries of interest. We show that our approach is effective at inferring useful taint specifications at scale. Our prototype tool automatically extracts 146 additional taint sinks and 7 840 propagation summaries spanning 1 393 npm modules. By integrating the extracted specifications into a commercial, state-of-the-art static analysis, 136 new alerts are produced, many of which correspond to likely security vulnerabilities. Moreover, many important specifications that were originally manually written are among the ones that our tool can now extract automatically.

2022-08-12
Al Khayer, Aala, Almomani, Iman, Elkawlak, Khaled.  2020.  ASAF: Android Static Analysis Framework. 2020 First International Conference of Smart Systems and Emerging Technologies (SMARTTECH). :197–202.
Android Operating System becomes a major target for malicious attacks. Static analysis approach is widely used to detect malicious applications. Most of existing studies on static analysis frameworks are limited to certain features. This paper presents an Android Static Analysis Framework (ASAF) which models the overall static analysis phases and approaches for Android applications. ASAF can be implemented for different purposes including Android malicious apps detection. The proposed framework utilizes a parsing tool, Android Static Parse (ASParse) which is also introduced in this paper. Through the extendibility of the ASParse tool, future research studies can easily extend the parsed features and the parsed files to perform parsing based on their specific requirements and goals. Moreover, a case study is conducted to illustrate the implementation of the proposed ASAF.
2021-03-09
Akram, B., Ogi, D..  2020.  The Making of Indicator of Compromise using Malware Reverse Engineering Techniques. 2020 International Conference on ICT for Smart Society (ICISS). CFP2013V-ART:1—6.

Malware threats often go undetected immediately, because attackers can camouflage well within the system. The users realize this after the devices stop working and cause harm for them. One way to deceive malicious content detection, malware authors use packers. Malware analysis is an activity to gain knowledge about malware. Reverse engineering is a technique used to identify and deal with new viruses or to understand malware behavior. Therefore, this technique can be the right choice for conducting malware analysis, especially for malware with packers. The results of the analysis are used as a source for making creating indicator of compromise in the YARA rule format. YARA rule is used as a component for detecting malware using the indicators obtained in the analysis process.

2021-11-29
Andarzian, Seyed Behnam, Ladani, Behrouz Tork.  2020.  Compositional Taint Analysis of Native Codes for Security Vetting of Android Applications. 2020 10th International Conference on Computer and Knowledge Engineering (ICCKE). :567–572.
Security vetting of Android applications is one of the crucial aspects of the Android ecosystem. Regarding the state of the art tools for this goal, most of them doesn't consider analyzing native codes and only analyze the Java code. However, Android concedes its developers to implement a part or all of their applications using C or C++ code. Thus, applying conservative manners for analyzing Android applications while ignoring native codes would lead to less precision in results. Few works have tried to analyze Android native codes, but only JN-SAF has applied taint analysis using static techniques such as symbolic execution. However, symbolic execution has some problems when is used in large programs. One of these problems is the exponential growth of program paths that would raise the path explosion issue. In this work, we have tried to alleviate this issue by introducing our new tool named CTAN. CTAN applies new symbolic execution methods to angr in a particular way that it can make JN-SAF more efficient and faster. We have introduced compositional taint analysis in CTAN by combining satisfiability modulo theories with symbolic execution. Our experiments show that CTAN is 26 percent faster than its previous work JN-SAF and it also leads to more precision by detecting more data-leakage in large Android native codes.
2022-10-16
Adamenko, Yu.V., Medvedev, A.A., Karpunin, D.A..  2020.  Development of a System for Static Analysis of C ++ Language Code. 2020 International Multi-Conference on Industrial Engineering and Modern Technologies (FarEastCon). :1–5.
The main goal of the system is to make it easier to standardize the style of program code written in C++. Based on the results of the review of existing static analyzers, in addition to the main requirements, requirements for the structure of stylistic rules were identified. Based on the results obtained, a system for static analysis of the C++ language has been developed, consisting of a set of modules. The system is implemented using the Python 3.7 programming language. HTML and CSS markup languages were used to generate html reports. To ensure that rules can be stored in the database, the MongoDB database management system and the pymongo driver module were used.
2021-09-21
Ramadhan, Beno, Purwanto, Yudha, Ruriawan, Muhammad Faris.  2020.  Forensic Malware Identification Using Naive Bayes Method. 2020 International Conference on Information Technology Systems and Innovation (ICITSI). :1–7.
Malware is a kind of software that, if installed on a malware victim's device, might carry malicious actions. The malicious actions might be data theft, system failure, or denial of service. Malware analysis is a process to identify whether a piece of software is a malware or not. However, with the advancement of malware technologies, there are several evasion techniques that could be implemented by malware developers to prevent analysis, such as polymorphic and oligomorphic. Therefore, this research proposes an automatic malware detection system. In the system, the malware characteristics data were obtained through both static and dynamic analysis processes. Data from the analysis process were classified using Naive Bayes algorithm to identify whether the software is a malware or not. The process of identifying malware and benign files using the Naive Bayes machine learning method has an accuracy value of 93 percent for the detection process using static characteristics and 85 percent for detection through dynamic characteristics.
2021-03-04
Matin, I. Muhamad Malik, Rahardjo, B..  2020.  A Framework for Collecting and Analysis PE Malware Using Modern Honey Network (MHN). 2020 8th International Conference on Cyber and IT Service Management (CITSM). :1—5.

Nowadays, Windows is an operating system that is very popular among people, especially users who have limited knowledge of computers. But unconsciously, the security threat to the windows operating system is very high. Security threats can be in the form of illegal exploitation of the system. The most common attack is using malware. To determine the characteristics of malware using dynamic analysis techniques and static analysis is very dependent on the availability of malware samples. Honeypot is the most effective malware collection technique. But honeypot cannot determine the type of file format contained in malware. File format information is needed for the purpose of handling malware analysis that is focused on windows-based malware. For this reason, we propose a framework that can collect malware information as well as identify malware PE file type formats. In this study, we collected malware samples using a modern honey network. Next, we performed a feature extraction to determine the PE file format. Then, we classify types of malware using VirusTotal scanning. As the results of this study, we managed to get 1.222 malware samples. Out of 1.222 malware samples, we successfully extracted 945 PE malware. This study can help researchers in other research fields, such as machine learning and deep learning, for malware detection.

2021-03-15
Danilova, A., Naiakshina, A., Smith, M..  2020.  One Size Does Not Fit All: A Grounded Theory and Online Survey Study of Developer Preferences for Security Warning Types. 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). :136–148.
A wide range of tools exist to assist developers in creating secure software. Many of these tools, such as static analysis engines or security checkers included in compilers, use warnings to communicate security issues to developers. The effectiveness of these tools relies on developers heeding these warnings, and there are many ways in which these warnings could be displayed. Johnson et al. [46] conducted qualitative research and found that warning presentation and integration are main issues. We built on Johnson et al.'s work and examined what developers want from security warnings, including what form they should take and how they should integrate into their workflow and work context. To this end, we conducted a Grounded Theory study with 14 professional software developers and 12 computer science students as well as a focus group with 7 academic researchers to gather qualitative insights. To back up the theory developed from the qualitative research, we ran a quantitative survey with 50 professional software developers. Our results show that there is significant heterogeneity amongst developers and that no one warning type is preferred over all others. The context in which the warnings are shown is also highly relevant, indicating that it is likely to be beneficial if IDEs and other development tools become more flexible in their warning interactions with developers. Based on our findings, we provide concrete recommendations for both future research as well as how IDEs and other security tools can improve their interaction with developers.
2021-11-29
Hough, Katherine, Welearegai, Gebrehiwet, Hammer, Christian, Bell, Jonathan.  2020.  Revealing Injection Vulnerabilities by Leveraging Existing Tests. 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). :284–296.
Code injection attacks, like the one used in the high-profile 2017 Equifax breach, have become increasingly common, now ranking \#1 on OWASP's list of critical web application vulnerabilities. Static analyses for detecting these vulnerabilities can overwhelm developers with false positive reports. Meanwhile, most dynamic analyses rely on detecting vulnerabilities as they occur in the field, which can introduce a high performance overhead in production code. This paper describes a new approach for detecting injection vulnerabilities in applications by harnessing the combined power of human developers' test suites and automated dynamic analysis. Our new approach, Rivulet, monitors the execution of developer-written functional tests in order to detect information flows that may be vulnerable to attack. Then, Rivulet uses a white-box test generation technique to repurpose those functional tests to check if any vulnerable flow could be exploited. When applied to the version of Apache Struts exploited in the 2017 Equifax attack, Rivulet quickly identifies the vulnerability, leveraging only the tests that existed in Struts at that time. We compared Rivulet to the state-of-the-art static vulnerability detector Julia on benchmarks, finding that Rivulet outperformed Julia in both false positives and false negatives. We also used Rivulet to detect new vulnerabilities.
2021-07-27
Wang, X., Shen, Q., Luo, W., Wu, P..  2020.  RSDS: Getting System Call Whitelist for Container Through Dynamic and Static Analysis. 2020 IEEE 13th International Conference on Cloud Computing (CLOUD). :600—608.
Container technology has been used for running multiple isolated operating system distros on a host or deploying large scale microservice-based applications. In most cases, containers share the same kernel with the host and other containers on the same host, and the application in the container can make system calls of the host kernel like a normal process on the host. Seccomp is a security mechanism for the Linux kernel, through which we can prohibit certain system calls from being executed by the program. Docker began to support the seccomp mechanism from version 1.10 and disables around 44 system calls out of 300+ by default. However, for a particular container, there are still many system calls that are unnecessary for running it allowed to be executed, and the abuse of system calls by a compromised container can trigger the security vulnerabilities of a host kernel. Unfortunately, Docker does not provide a way to get the necessary system calls for a particular container. In this paper, we propose RSDS, a method combining dynamic analysis and static analysis to get the necessary system calls for a particular container. Our experiments show that our solution can reduce system calls by 69.27%-85.89% compared to the default configuration on an x86-64 PC with Ubuntu 16.04 host OS and does not affect the functionalities of these containers.
2021-05-05
Herrera, Adrian.  2020.  Optimizing Away JavaScript Obfuscation. 2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM). :215—220.

JavaScript is a popular attack vector for releasing malicious payloads on unsuspecting Internet users. Authors of this malicious JavaScript often employ numerous obfuscation techniques in order to prevent the automatic detection by antivirus and hinder manual analysis by professional malware analysts. Consequently, this paper presents SAFE-DEOBS, a JavaScript deobfuscation tool that we have built. The aim of SAFE-DEOBS is to automatically deobfuscate JavaScript malware such that an analyst can more rapidly determine the malicious script's intent. This is achieved through a number of static analyses, inspired by techniques from compiler theory. We demonstrate the utility of SAFE-DEOBS through a case study on real-world JavaScript malware, and show that it is a useful addition to a malware analyst's toolset.

2022-08-12
Aslanyan, Hayk, Arutunian, Mariam, Keropyan, Grigor, Kurmangaleev, Shamil, Vardanyan, Vahagn.  2020.  BinSide : Static Analysis Framework for Defects Detection in Binary Code. 2020 Ivannikov Memorial Workshop (IVMEM). :3–8.

Software developers make mistakes that can lead to failures of a software product. One approach to detect defects is static analysis: examine code without execution. Currently, various source code static analysis tools are widely used to detect defects. However, source code analysis is not enough. The reason for this is the use of third-party binary libraries, the unprovability of the correctness of all compiler optimizations. This paper introduces BinSide : binary static analysis framework for defects detection. It does interprocedural, context-sensitive and flow-sensitive analysis. The framework uses platform independent intermediate representation and provide opportunity to analyze various architectures binaries. The framework includes value analysis, reaching definition, taint analysis, freed memory analysis, constant folding, and constant propagation engines. It provides API (application programming interface) and can be used to develop new analyzers. Additionally, we used the API to develop checkers for classic buffer overflow, format string, command injection, double free and use after free defects detection.

2022-10-16
Lee, Sungho, Lee, Hyogun, Ryu, Sukyoung.  2020.  Broadening Horizons of Multilingual Static Analysis: Semantic Summary Extraction from C Code for JNI Program Analysis. 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). :127–137.
Most programming languages support foreign language interoperation that allows developers to integrate multiple modules implemented in different languages into a single multilingual program. While utilizing various features from multiple languages expands expressivity, differences in language semantics require developers to understand the semantics of multiple languages and their inter-operation. Because current compilers do not support compile-time checking for interoperation, they do not help developers avoid in-teroperation bugs. Similarly, active research on static analysis and bug detection has been focusing on programs written in a single language. In this paper, we propose a novel approach to analyze multilingual programs statically. Unlike existing approaches that extend a static analyzer for a host language to support analysis of foreign function calls, our approach extracts semantic summaries from programs written in guest languages using a modular analysis technique, and performs a whole-program analysis with the extracted semantic summaries. To show practicality of our approach, we design and implement a static analyzer for multilingual programs, which analyzes JNI interoperation between Java and C. Our empirical evaluation shows that the analyzer is scalable in that it can construct call graphs for large programs that use JNI interoperation, and useful in that it found 74 genuine interoperation bugs in real-world Android JNI applications.
2022-08-12
Stiévenart, Quentin, Roover, Coen De.  2020.  Compositional Information Flow Analysis for WebAssembly Programs. 2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM). :13–24.
WebAssembly is a new W3C standard, providing a portable target for compilation for various languages. All major browsers can run WebAssembly programs, and its use extends beyond the web: there is interest in compiling cross-platform desktop applications, server applications, IoT and embedded applications to WebAssembly because of the performance and security guarantees it aims to provide. Indeed, WebAssembly has been carefully designed with security in mind. In particular, WebAssembly applications are sandboxed from their host environment. However, recent works have brought to light several limitations that expose WebAssembly to traditional attack vectors. Visitors of websites using WebAssembly have been exposed to malicious code as a result. In this paper, we propose an automated static program analysis to address these security concerns. Our analysis is focused on information flow and is compositional. For every WebAssembly function, it first computes a summary that describes in a sound manner where the information from its parameters and the global program state can flow to. These summaries can then be applied during the subsequent analysis of function calls. Through a classical fixed-point formulation, one obtains an approximation of the information flow in the WebAssembly program. This results in the first compositional static analysis for WebAssembly. On a set of 34 benchmark programs spanning 196kLOC of WebAssembly, we compute at least 64% of the function summaries precisely in less than a minute in total.
2022-10-16
Van Es, Noah, Van der Plas, Jens, Stiévenart, Quentin, De Roover, Coen.  2020.  MAF: A Framework for Modular Static Analysis of Higher-Order Languages. 2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM). :37–42.
A modular static analysis decomposes a program's analysis into analyses of its parts, or components. An intercomponent analysis instructs an intra-component analysis to analyse each component independently of the others. Additional analyses are scheduled for newly discovered components, and for dependent components that need to account for newly discovered component information. Modular static analyses are scalable, can be tuned to a high precision, and support the analysis of programs that are highly dynamic, featuring e.g., higher-order functions or dynamically allocated processes.In this paper, we present the engineering aspects of MAF, a static analysis framework for implementing modular analyses for higher-order languages. For any such modular analysis, the framework provides a reusable inter-component analysis and it suffices to implement its intra-component analysis. The intracomponent analysis can be composed from several interdependent and reusable Scala traits. This design facilitates changing the analysed language, as well as the analysis precision with minimal effort. We illustrate the use of MAF through its instantiation for several different analyses of Scheme programs.
Trautsch, Alexander, Herbold, Steffen, Grabowski, Jens.  2020.  Static source code metrics and static analysis warnings for fine-grained just-in-time defect prediction. 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). :127–138.
Software quality evolution and predictive models to support decisions about resource distribution in software quality assurance tasks are an important part of software engineering research. Recently, a fine-grained just-in-time defect prediction approach was proposed which has the ability to find bug-inducing files within changes instead of only complete changes. In this work, we utilize this approach and improve it in multiple places: data collection, labeling and features. We include manually validated issue types, an improved SZZ algorithm which discards comments, whitespaces and refactorings. Additionally, we include static source code metrics as well as static analysis warnings and warning density derived metrics as features. To assess whether we can save cost we incorporate a specialized defect prediction cost model. To evaluate our proposed improvements of the fine-grained just-in-time defect prediction approach we conduct a case study that encompasses 38 Java projects, 492,241 file changes in 73,598 commits and spans 15 years. We find that static source code metrics and static analysis warnings are correlated with bugs and that they can improve the quality and cost saving potential of just-in-time defect prediction models.
2022-08-12
Berman, Maxwell, Adams, Stephen, Sherburne, Tim, Fleming, Cody, Beling, Peter.  2019.  Active Learning to Improve Static Analysis. 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). :1322–1327.
Static analysis tools are programs that run on source code prior to their compilation to binary executables and attempt to find flaws or defects in the code during the early stages of development. If left unresolved, these flaws could pose security risks. While numerous static analysis tools exist, there is no single tool that is optimal. Therefore, many static analysis tools are often used to analyze code. Further, some of the alerts generated by the static analysis tools are low-priority or false alarms. Machine learning algorithms have been developed to distinguish between true alerts and false alarms, however significant man hours need to be dedicated to labeling data sets for training. This study investigates the use of active learning to reduce the number of labeled alerts needed to adequately train a classifier. The numerical experiments demonstrate that a query by committee active learning algorithm can be utilized to significantly reduce the number of labeled alerts needed to achieve similar performance as a classifier trained on a data set of nearly 60,000 labeled alerts.