Visible to the public Biblio

Filters: Keyword is taint analysis  [Clear All Filters]
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z   [Show ALL]
A
Amir-Mohammadian, Sepehr, Skalka, Christian.  2016.  In-Depth Enforcement of Dynamic Integrity Taint Analysis. Proceedings of the 2016 ACM Workshop on Programming Languages and Analysis for Security. :43–56.

Dynamic taint analysis can be used as a defense against low-integrity data in applications with untrusted user interfaces. An important example is defense against XSS and injection attacks in programs with web interfaces. Data sanitization is commonly used in this context, and can be treated as a precondition for endorsement in a dynamic integrity taint analysis. However, sanitization is often incomplete in practice. We develop a model of dynamic integrity taint analysis for Java that addresses imperfect sanitization with an in-depth approach. To avoid false positives, results of sanitization are endorsed for access control (aka prospective security), but are tracked and logged for auditing and accountability (aka retrospective security). We show how this heterogeneous prospective/retrospective mechanism can be specified as a uniform policy, separate from code. We then use this policy to establish correctness conditions for a program rewriting algorithm that instruments code for the analysis. The rewriting itself is a model of existing, efficient Java taint analysis tools.

Andarzian, Seyed Behnam, Ladani, Behrouz Tork.  2020.  Compositional Taint Analysis of Native Codes for Security Vetting of Android Applications. 2020 10th International Conference on Computer and Knowledge Engineering (ICCKE). :567–572.
Security vetting of Android applications is one of the crucial aspects of the Android ecosystem. Regarding the state of the art tools for this goal, most of them doesn't consider analyzing native codes and only analyze the Java code. However, Android concedes its developers to implement a part or all of their applications using C or C++ code. Thus, applying conservative manners for analyzing Android applications while ignoring native codes would lead to less precision in results. Few works have tried to analyze Android native codes, but only JN-SAF has applied taint analysis using static techniques such as symbolic execution. However, symbolic execution has some problems when is used in large programs. One of these problems is the exponential growth of program paths that would raise the path explosion issue. In this work, we have tried to alleviate this issue by introducing our new tool named CTAN. CTAN applies new symbolic execution methods to angr in a particular way that it can make JN-SAF more efficient and faster. We have introduced compositional taint analysis in CTAN by combining satisfiability modulo theories with symbolic execution. Our experiments show that CTAN is 26 percent faster than its previous work JN-SAF and it also leads to more precision by detecting more data-leakage in large Android native codes.
Antonopoulos, Timos, Gazzillo, Paul, Hicks, Michael, Koskinen, Eric, Terauchi, Tachio, Wei, Shiyi.  2017.  Decomposition Instead of Self-composition for Proving the Absence of Timing Channels. Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. :362–375.

We present a novel approach to proving the absence of timing channels. The idea is to partition the program’s execution traces in such a way that each partition component is checked for timing attack resilience by a time complexity analysis and that per-component resilience implies the resilience of the whole program. We construct a partition by splitting the program traces at secret-independent branches. This ensures that any pair of traces with the same public input has a component containing both traces. Crucially, the per-component checks can be normal safety properties expressed in terms of a single execution. Our approach is thus in contrast to prior approaches, such as self-composition, that aim to reason about multiple (k≥ 2) executions at once. We formalize the above as an approach called quotient partitioning, generalized to any k-safety property, and prove it to be sound. A key feature of our approach is a demand-driven partitioning strategy that uses a regex-like notion called trails to identify sets of execution traces, particularly those influenced by tainted (or secret) data. We have applied our technique in a prototype implementation tool called Blazer, based on WALA, PPL, and the brics automaton library. We have proved timing-channel freedom of (or synthesized an attack specification for) 24 programs written in Java bytecode, including 6 classic examples from the literature and 6 examples extracted from the DARPA STAC challenge problems.

Ashouri, Mohammadreza.  2019.  Detecting Input Sanitization Errors in Scala. 2019 Seventh International Symposium on Computing and Networking Workshops (CANDARW). :313—319.

Scala programming language combines object-oriented and functional programming in one concise, high-level language, and the language supports static types that help to avoid bugs in complex programs. This paper proposes a dynamic taint analyzer called ScalaTaint for Scala applications. The analyzer traces the propagation of malicious inputs from untrusted sources to sensitive sink methods in programs that can be exploited by adversaries. In this work, we evaluated the accuracy of ScalaTaint with a security benchmark suite including 7 projects in Scala. As a result, our analyzer could report 49 vulnerabilities within 753,372 lines of code. Moreover, the result of our performance measurement on ScalaBench shows 67% runtime overhead that demonstrates the usefulness and efficiently of our technique in comparison with similar tools.

B
Brennan, Tegan.  2017.  Path Cost Analysis for Side Channel Detection. Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis. :416–419.

Side-channels have been increasingly demonstrated as a practical threat to the confidentiality of private user information. Being able to statically detect these kinds of vulnerabilites is a key challenge in current computer security research. We introduce a new technique, path-cost analysis (PCA), for the detection of side-channels. Given a cost model for a type of side-channel, path-cost analysis assigns a symbolic cost expression to every node and every back edge of a method's control flow graph that gives an over-approximation for all possible observable values at that node or after traversing that cycle. Queries to a satisfiability solver on the maximum distance between specific pairs of nodes allow us to detect the presence of imbalanced paths through the control flow graph. When combined with taint analysis, we are able to answer the following question: does there exist a pair of paths in the method's control flow graph, differing only on branch conditions influenced by the secret, that differs in observable value by more than some given threshold? In fact, we are able to answer the specifically state what sets of secret-sensitive conditional statements introduce a side-channel detectable given some noise parameter. We extend this approach to an interprocedural analysis, resulting in a over-approximation of the number of true side-channels in the program according to the given cost model. Greater precision can be obtained by combining our method with predicate abstraction or symbolic execution to eliminate a subset of the infeasible paths through the control flow graph. We propose evaluating our method on a set of sizeable Java server-client applications.

C
Cao, Mengchen, Hou, Xiantong, Wang, Tao, Qu, Hunter, Zhou, Yajin, Bai, Xiaolong, Wang, Fuwei.  2019.  Different is Good: Detecting the Use of Uninitialized Variables through Differential Replay. Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. :1883–1897.
The use of uninitialized variables is a common issue. It could cause kernel information leak, which defeats the widely deployed security defense, i.e., kernel address space layout randomization (KASLR). Though a recent system called Bochspwn Reloaded reported multiple memory leaks in Windows kernels, how to effectively detect this issue is still largely behind. In this paper, we propose a new technique, i.e., differential replay, that could effectively detect the use of uninitialized variables. Specifically, it records and replays a program's execution in multiple instances. One instance is with the vanilla memory, the other one changes (or poisons) values of variables allocated from the stack and the heap. Then it compares program states to find references to uninitialized variables. The idea is that if a variable is properly initialized, it will overwrite the poisoned value and program states in two running instances should be the same. After detecting the differences, our system leverages the symbolic taint analysis to further identify the location where the variable was allocated. This helps us to identify the root cause and facilitate the development of real exploits. We have implemented a prototype called TimePlayer. After applying it to both Windows 7 and Windows 10 kernels (x86/x64), it successfully identified 34 new issues and another 85 ones that had been patched (some of them were publicly unknown.) Among 34 new issues, 17 of them have been confirmed as zero-day vulnerabilities by Microsoft.
Chen, Jia, Feng, Yu, Dillig, Isil.  2017.  Precise Detection of Side-Channel Vulnerabilities Using Quantitative Cartesian Hoare Logic. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. :875–890.
This paper presents Themis, an end-to-end static analysis tool for finding resource-usage side-channel vulnerabilities in Java applications. We introduce the notion of epsilon-bounded non-interference, a variant and relaxation of Goguen and Meseguer's well-known non-interference principle. We then present Quantitative Cartesian Hoare Logic (QCHL), a program logic for verifying epsilon-bounded non-interference. Our tool, Themis, combines automated reasoning in CHL with lightweight static taint analysis to improve scalability. We evaluate Themis on well known Java applications and demonstrate that Themis can find unknown side-channel vulnerabilities in widely-used programs. We also show that Themis can verify the absence of vulnerabilities in repaired versions of vulnerable programs and that Themis compares favorably against Blazer, a state-of-the-art static analysis tool for finding timing side channels in Java applications.
Chen, Quan, Kapravelos, Alexandros.  2018.  Mystique: Uncovering Information Leakage from Browser Extensions. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. :1687–1700.
Browser extensions are small JavaScript, CSS and HTML programs that run inside the browser with special privileges. These programs, often written by third parties, operate on the pages that the browser is visiting, giving the user a programmatic way to configure the browser. The privacy implications that arise by allowing privileged third-party code to execute inside the users' browser are not well understood. In this paper, we develop a taint analysis framework for browser extensions and use it to perform a large scale study of extensions in regard to their privacy practices. We first present a hybrid approach to traditional taint analysis: by leveraging the fact that extension source code is available to the runtime JavaScript engine, we implement as well as enhance traditional taint analysis using information gathered from static data flow and control-flow analysis of the JavaScript source code. Based on this, we further modify the Chromium browser to support taint tracking for extensions. We analyzed 178,893 extensions crawled from the Chrome Web Store between September 2016 and March 2018, as well as a separate set of all available extensions (2,790 in total) for the Opera browser at the time of analysis. From these, our analysis flagged 3,868 (2.13%) extensions as potentially leaking privacy-sensitive information. The top 10 most popular Chrome extensions that we confirmed to be leaking privacy-sensitive information have more than 60 million users combined. We ran the analysis on a local Kubernetes cluster and were able to finish within a month, demonstrating the feasibility of our approach for large-scale analysis of browser extensions. At the same time, our results emphasize the threat browser extensions pose to user privacy, and the need for countermeasures to safeguard against misbehaving extensions that abuse their privileges.
Chen, Xiarun, Li, Qien, Yang, Zhou, Liu, Yongzhi, Shi, Shaosen, Xie, Chenglin, Wen, Weiping.  2021.  VulChecker: Achieving More Effective Taint Analysis by Identifying Sanitizers Automatically. 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :774–782.
The automatic detection of vulnerabilities in Web applications using taint analysis is a hot topic. However, existing taint analysis methods for sanitizers identification are too simple to find available taint transmission chains effectively. These methods generally use pre-constructed dictionaries or simple keywords to identify, which usually suffer from large false positives and false negatives. No doubt, it will have a greater impact on the final result of the taint analysis. To solve that, we summarise and classify the commonly used sanitizers in Web applications and propose an identification method based on semantic analysis. Our method can accurately and completely identify the sanitizers in the target Web applications through static analysis. Specifically, we analyse the natural semantics and program semantics of existing sanitizers, use semantic analysis to find more in Web applications. Besides, we implemented the method prototype in PHP and achieved a vulnerability detection tool called VulChecker. Then, we experimented with some popular open-source CMS frameworks. The results show that Vulchecker can accurately identify more sanitizers. In terms of vulnerability detection, VulChecker also has a lower false positive rate and a higher detection rate than existing methods. Finally, we used VulChecker to analyse the latest PHP applications. We identified several new suspicious taint data propagation chains. Before the paper was completed, we have identified four unreported vulnerabilities. In general, these results show that our approach is highly effective in improving vulnerability detection based on taint analysis.
D
Do, Lisa Nguyen Quang, Ali, Karim, Livshits, Benjamin, Bodden, Eric, Smith, Justin, Murphy-Hill, Emerson.  2017.  Just-in-time Static Analysis. Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis. :307–317.
We present the concept of Just-In-Time (JIT) static analysis that interleaves code development and bug fixing in an integrated development environment. Unlike traditional batch-style analysis tools, a JIT analysis tool presents warnings to code developers over time, providing the most relevant results quickly, and computing less relevant results incrementally later. In this paper, we describe general guidelines for designing JIT analyses. We also present a general recipe for transforming static data-flow analyses to JIT analyses through a concept of layered analysis execution. We illustrate this transformation through CHEETAH, a JIT taint analysis for Android applications. Our empirical evaluation of CHEETAH on real-world applications shows that our approach returns warnings quickly enough to avoid disrupting the normal workflow of developers. This result is confirmed by our user study, in which developers fixed data leaks twice as fast when using CHEETAH compared to an equivalent batch-style analysis.
E
Etigowni, Sriharsha, Tian, Dave(Jing), Hernandez, Grant, Zonouz, Saman, Butler, Kevin.  2016.  CPAC: Securing Critical Infrastructure with Cyber-physical Access Control. Proceedings of the 32Nd Annual Conference on Computer Security Applications. :139–152.

Critical infrastructure such as the power grid has become increasingly complex. The addition of computing elements to traditional physical components increases complexity and hampers insight into how elements in the system interact with each other. The result is an infrastructure where operational mistakes, some of which cannot be distinguished from attacks, are more difficult to prevent and have greater potential impact, such as leaking sensitive information to the operator or attacker. In this paper, we present CPAC, a cyber-physical access control solution to manage complexity and mitigate threats in cyber-physical environments, with a focus on the electrical smart grid. CPAC uses information flow analysis based on mathematical models of the physical grid to generate policies enforced through verifiable logic. At the device side, CPAC combines symbolic execution with lightweight dynamic execution monitoring to allow non-intrusive taint analysis on programmable logic controllers in realtime. These components work together to provide a realtime view of all system elements, and allow for more robust and finer-grained protections than any previous solution to securing the grid. We implement a prototype of CPAC using Bachmann PLCs and evaluate several real-world incidents that demonstrate its scalability and effectiveness. The policy checking for a nation-wide grid is less than 150 ms, faster than existing solutions. We additionally show that CPAC can analyze potential component failures for arbitrary component failures, far beyond the capabilities of currently deployed systems. CPAC thus provides a solution to secure the modern smart grid from operator mistakes or insider attacks, maintain operational privacy, and support N - x contingencies.

F
Facon, A., Guilley, S., Lec'Hvien, M., Schaub, A., Souissi, Y..  2018.  Detecting Cache-Timing Vulnerabilities in Post-Quantum Cryptography Algorithms. 2018 IEEE 3rd International Verification and Security Workshop (IVSW). :7-12.

When implemented on real systems, cryptographic algorithms are vulnerable to attacks observing their execution behavior, such as cache-timing attacks. Designing protected implementations must be done with knowledge and validation tools as early as possible in the development cycle. In this article we propose a methodology to assess the robustness of the candidates for the NIST post-quantum standardization project to cache-timing attacks. To this end we have developed a dedicated vulnerability research tool. It performs a static analysis with tainting propagation of sensitive variables across the source code and detects leakage patterns. We use it to assess the security of the NIST post-quantum cryptography project submissions. Our results show that more than 80% of the analyzed implementations have at least one potential flaw, and three submissions total more than 1000 reported flaws each. Finally, this comprehensive study of the competitors security allows us to identify the most frequent weaknesses amongst candidates and how they might be fixed.

Fu, Xiaoqin, Cai, Haipeng.  2020.  Scaling Application-Level Dynamic Taint Analysis to Enterprise-Scale Distributed Systems. 2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). :270–271.
With the increasing deployment of enterprise-scale distributed systems, effective and practical defenses for such systems against various security vulnerabilities such as sensitive data leaks are urgently needed. However, most existing solutions are limited to centralized programs. For real-world distributed systems which are of large scales, current solutions commonly face one or more of scalability, applicability, and portability challenges. To overcome these challenges, we develop a novel dynamic taint analysis for enterprise-scale distributed systems. To achieve scalability, we use a multi-phase analysis strategy to reduce the overall cost. We infer implicit dependencies via partial-ordering method events in distributed programs to address the applicability challenge. To achieve greater portability, the analysis is designed to work at an application level without customizing platforms. Empirical results have shown promising scalability and capabilities of our approach.
Fursova, Natalia, Dovgalyuk, Pavel, Vasiliev, Ivan, Klimushenkova, Maria, Egorov, Danila.  2021.  Detecting Attack Surface With Full-System Taint Analysis. 2021 IEEE 21st International Conference on Software Quality, Reliability and Security Companion (QRS-C). :1161–1162.
Attack surface detection for the complex software is needed to find targets for the fuzzing, because testing the whole system with many inputs is not realistic. Researchers that previously applied taint analysis for dealing with different security tasks in the virtual machines did not examined how to apply it for attack surface detection. I.e., getting the program modules and functions, that may be affected by input data. We propose using taint tracking within a virtual machine and virtual machine introspection to create a new approach that can detect the internal module interfaces that can be fuzz tested to assure that software is safe or find the vulnerabilities.
G
Gao, Fengjuan, Chen, Tianjiao, Wang, Yu, Situ, Lingyun, Wang, Linzhang, Li, Xuandong.  2016.  Carraybound: Static Array Bounds Checking in C Programs Based on Taint Analysis. Proceedings of the 8th Asia-Pacific Symposium on Internetware. :81–90.

C programming language never performs automatic bounds checking in order to speed up execution. But bounds checking is absolutely necessary in any program. Because if a variable is out-of-bounds, some serious errors may occur during execution, such as endless loop or buffer overflows. When there are arrays used in a program, the index of an array must be within the boundary of the array. But programmers always miss the array bounds checking or do not perform a correct array bounds checking. In this paper, we perform static analysis based on taint analysis and data flow analysis to detect which arrays do not have correct array bounds checking in the program. And we implement an automatic static tool, Carraybound. And the experimental results show that Carraybound can work effectively and efficiently.

Gao, Jianbo, Liu, Han, Liu, Chao, Li, Qingshan, Guan, Zhi, Chen, Zhong.  2019.  EasyFlow: keep ethereum away from overflow. Proceedings of the 41st International Conference on Software Engineering: Companion Proceedings. :23–26.
While Ethereum smart contracts enabled a wide range of blockchain applications, they are extremely vulnerable to different forms of security attacks. Due to the fact that transactions to smart contracts commonly involve cryptocurrency transfer, any successful attacks can lead to money loss or even financial disorder. In this paper, we focus on the overflow attacks in Ethereum, mainly because they widely rooted in many smart contracts and comparatively easy to exploit. We have developed EasyFlow, an overflow detector at Ethereum Virtual Machine level. The key insight behind EasyFlow is a taint analysis based tracking technique to analyze the propagation of involved taints. Specifically, EasyFlow can not only divide smart contracts into safe contracts, manifested overflows, well-protected overflows and potential overflows, but also automatically generate transactions to trigger potential overflows. In our preliminary evaluation, EasyFlow managed to find potentially vulnerable Ethereum contracts with little runtime overhead. A demo video of EasyFlow is at https://youtu.be/QbUJkQI0L6o.
Guan, Le, Cao, Chen, Zhu, Sencun, Lin, Jingqiang, Liu, Peng, Xia, Yubin, Luo, Bo.  2019.  Protecting mobile devices from physical memory attacks with targeted encryption. Proceedings of the 12th Conference on Security and Privacy in Wireless and Mobile Networks. :34–44.
Sensitive data in a process could be scattered over the memory of a computer system for a prolonged period of time. Unfortunately, DRAM chips were proven insecure in previous studies. The problem becomes worse in the mobile environment, in which users' smartphones are easily lost or stolen. The powered-on phones may contain sensitive data in the vulnerable DRAM chips. In this paper, we propose MemVault, a mechanism to protect sensitive data in Android devices against physical memory attacks. MemVault keeps track of the propagation of well-marked sensitive data sources, and selectively encrypts tainted sensitive memory contents in the DRAM chip. When a tainted object is accessed, MemVault redirects the access to the internal RAM (iRAM), where the cipher-text object is decrypted transparently. iRAM is a system-on-chip (SoC) component which is by nature immune to physical memory exploits. We have implemented a MemVault prototype system, and have evaluated it with extensive experiments. Our results validate that MemVault effectively eliminates the occurrences of clear-text sensitive objects in DRAM chips, and imposes acceptable overheads.
Gupta, M.K., Govil, M.C., Singh, G..  2014.  A context-sensitive approach for precise detection of cross-site scripting vulnerabilities. Innovations in Information Technology (INNOVATIONS), 2014 10th International Conference on. :7-12.

Currently, dependence on web applications is increasing rapidly for social communication, health services, financial transactions and many other purposes. Unfortunately, the presence of cross-site scripting vulnerabilities in these applications allows malicious user to steals sensitive information, install malware, and performs various malicious operations. Researchers proposed various approaches and developed tools to detect XSS vulnerability from source code of web applications. However, existing approaches and tools are not free from false positive and false negative results. In this paper, we propose a taint analysis and defensive programming based HTML context-sensitive approach for precise detection of XSS vulnerability from source code of PHP web applications. It also provides automatic suggestions to improve the vulnerable source code. Preliminary experiments and results on test subjects show that proposed approach is more efficient than existing ones.

H
He, Dongjie, Li, Haofeng, Wang, Lei, Meng, Haining, Zheng, Hengjie, Liu, Jie, Hu, Shuangwei, Li, Lian, Xue, Jingling.  2019.  Performance-Boosting Sparsification of the IFDS Algorithm with Applications to Taint Analysis. 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). :267–279.
The IFDS algorithm can be compute-and memoryintensive for some large programs, often running for a long time (more than expected) or terminating prematurely after some time and/or memory budgets have been exhausted. In the latter case, the corresponding IFDS data-flow analyses may suffer from false negatives and/or false positives. To improve this, we introduce a sparse alternative to the traditional IFDS algorithm. Instead of propagating the data-flow facts across all the program points along the program’s (interprocedural) control flow graph, we propagate every data-flow fact directly to its next possible use points along its own sparse control flow graph constructed on the fly, thus reducing significantly both the time and memory requirements incurred by the traditional IFDS algorithm. In our evaluation, we compare FLOWDROID, a taint analysis performed by using the traditional IFDS algorithm, with our sparse incarnation, SPARSEDROID, on a set of 40 Android apps selected. For the time budget (5 hours) and memory budget (220GB) allocated per app, SPARSEDROID can run every app to completion but FLOWDROID terminates prematurely for 9 apps, resulting in an average speedup of 22.0x. This implies that when used as a market-level vetting tool, SPARSEDROID can finish analyzing these 40 apps in 2.13 hours (by issuing 228 leak warnings) while FLOWDROID manages to analyze only 30 apps in the same time period (by issuing only 147 leak warnings).
Hermerschmidt, Lars, Straub, Andreas, Piskachev, Goran.  2020.  Language-Agnostic Injection Detection. 2020 IEEE Security and Privacy Workshops (SPW). :268–275.
Formal languages are ubiquitous wherever software systems need to exchange or store data. Unparsing into and parsing from such languages is an error-prone process that has spawned an entire class of security vulnerabilities. There has been ample research into finding vulnerabilities on the parser side, but outside of language specific approaches, few techniques targeting unparser vulnerabilities exist. This work presents a language-agnostic approach for spotting injection vulnerabilities in unparsers. It achieves this by mining unparse trees using dynamic taint analysis to extract language keywords, which are leveraged for guided fuzzing. Vulnerabilities can thus be found without requiring prior knowledge about the formal language, and in fact, the approach is even applicable where no specification thereof exists at all. This empowers security researchers and developers alike to gain deeper understanding of unparser implementations through examination of the unparse trees generated by the approach, as well as enabling them to find new vulnerabilities in poorly-understood software. This work presents a language-agnostic approach for spotting injection vulnerabilities in unparsers. It achieves this by mining unparse trees using dynamic taint analysis to extract language keywords, which are leveraged for guided fuzzing. Vulnerabilities can thus be found without requiring prior knowledge about the formal language, and in fact, the approach is even applicable where no specification thereof exists at all. This empowers security researchers and developers alike to gain deeper understanding of unparser implementations through examination of the unparse trees generated by the approach, as well as enabling them to find new vulnerabilities in poorly-understood software.
Höschele, Matthias, Zeller, Andreas.  2016.  Mining Input Grammars from Dynamic Taints. Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. :720–725.

Knowing which part of a program processes which parts of an input can reveal the structure of the input as well as the structure of the program. In a URL textlesspretextgreaterhttp://www.example.com/path/textless/pretextgreater, for instance, the protocol textlesspretextgreaterhttptextless/pretextgreater, the host textlesspretextgreaterwww.example.comtextless/pretextgreater, and the path textlesspretextgreaterpathtextless/pretextgreater would be handled by different functions and stored in different variables. Given a set of sample inputs, we use dynamic tainting to trace the data flow of each input character, and aggregate those input fragments that would be handled by the same function into lexical and syntactical entities. The result is a context-free grammar that reflects valid input structure. In its evaluation, our AUTOGRAM prototype automatically produced readable and structurally accurate grammars for inputs like URLs, spreadsheets or configuration files. The resulting grammars not only allow simple reverse engineering of input formats, but can also directly serve as input for test generators.

Hough, Katherine, Welearegai, Gebrehiwet, Hammer, Christian, Bell, Jonathan.  2020.  Revealing Injection Vulnerabilities by Leveraging Existing Tests. 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). :284–296.
Code injection attacks, like the one used in the high-profile 2017 Equifax breach, have become increasingly common, now ranking \#1 on OWASP's list of critical web application vulnerabilities. Static analyses for detecting these vulnerabilities can overwhelm developers with false positive reports. Meanwhile, most dynamic analyses rely on detecting vulnerabilities as they occur in the field, which can introduce a high performance overhead in production code. This paper describes a new approach for detecting injection vulnerabilities in applications by harnessing the combined power of human developers' test suites and automated dynamic analysis. Our new approach, Rivulet, monitors the execution of developer-written functional tests in order to detect information flows that may be vulnerable to attack. Then, Rivulet uses a white-box test generation technique to repurpose those functional tests to check if any vulnerable flow could be exploited. When applied to the version of Apache Struts exploited in the 2017 Equifax attack, Rivulet quickly identifies the vulnerability, leveraging only the tests that existed in Struts at that time. We compared Rivulet to the state-of-the-art static vulnerability detector Julia on benchmarks, finding that Rivulet outperformed Julia in both false positives and false negatives. We also used Rivulet to detect new vulnerabilities.
Hung, Yu-Hsin, Jheng, Bing-Jhong, Li, Hong-Wei, Lai, Wen-Yang, Mallissery, Sanoop, Wu, Yu-Sung.  2021.  Mixed-mode Information Flow Tracking with Compile-time Taint Semantics Extraction and Offline Replay. 2021 IEEE Conference on Dependable and Secure Computing (DSC). :1–8.
Static information flow analysis (IFA) and dynamic information flow tracking (DIFT) have been widely employed in offline security analysis of computer programs. As security attacks become more sophisticated, there is a rising need for IFA and DIFT in production environment. However, existing systems usually deal with IFA and DIFT separately, and most DIFT systems incur significant performance overhead. We propose MIT to facilitate IFA and DIFT in online production environment. MIT offers mixed-mode information flow tracking at byte-granularity and incurs moderate runtime performance overhead. The core techniques consist of the extraction of taint semantics intermediate representation (TSIR) at compile-time and the decoupled execution of TSIR for information flow analysis. We conducted an extensive performance overhead evaluation on MIT to confirm its applicability in production environment. We also outline potential applications of MIT, including the implementation of data provenance checking and information flow based anomaly detection in real-world applications.
I
Inayoshi, Hiroki, Kakei, Shohei, Takimoto, Eiji, Mouri, Koichi, Saito, Shoichi.  2019.  Prevention of Data Leakage due to Implicit Information Flows in Android Applications. 2019 14th Asia Joint Conference on Information Security (AsiaJCIS). :103–110.
Dynamic Taint Analysis (DTA) technique has been developed for analysis and understanding behavior of Android applications and privacy policy enforcement. Meanwhile, implicit information flows (IIFs) are major concern of security researchers because IIFs can evade DTA technique easily and give attackers an advantage over the researchers. Some researchers suggested approaches to the issue and developed analysis systems supporting privacy policy enforcement against IIF-accompanied attacks; however, there is still no effective technique of comprehensive analysis and privacy policy enforcement against IIF-accompanied attacks. In this paper, we propose an IIF detection technique to enforce privacy policy against IIF-accompanied attacks in Android applications. We developed a new analysis tool, called Smalien, that can discover data leakage caused by IIF-contained information flows as well as explicit information flows. We demonstrated practicability of Smalien by applying it to 16 IIF tricks from ScrubDroid and two IIF tricks from DroidBench. Smalien enforced privacy policy successfully against all the tricks except one trick because the trick loads code dynamically from a remote server at runtime, and Smalien cannot analyze any code outside of a target application. The results show that our approach can be a solution to the current attacker-superior situation.
J
Jain, Vivek, Rawat, Sanjay, Giuffrida, Cristiano, Bos, Herbert.  2018.  TIFF: Using Input Type Inference To Improve Fuzzing. Proceedings of the 34th Annual Computer Security Applications Conference. :505-517.

Developers commonly use fuzzing techniques to hunt down all manner of memory corruption vulnerabilities during the testing phase. Irrespective of the fuzzer, input mutation plays a central role in providing adequate code coverage, as well as in triggering bugs. However, each class of memory corruption bugs requires a different trigger condition. While the goal of a fuzzer is to find bugs, most existing fuzzers merely approximate this goal by targeting their mutation strategies toward maximizing code coverage. In this work, we present a new mutation strategy that maximizes the likelihood of triggering memory-corruption bugs by generating fewer, but better inputs. In particular, our strategy achieves bug-directed mutation by inferring the type of the input bytes. To do so, it tags each offset of the input with a basic type (e.g., 32-bit integer, string, array etc.), while deriving mutation rules for specific classes of bugs. We infer types by means of in-memory data-structure identification and dynamic taint analysis, and implement our novel mutation strategy in a fully functional fuzzer which we call TIFF (Type Inference-based Fuzzing Framework). Our evaluation on real-world applications shows that type-based fuzzing triggers bugs much earlier than existing solutions, while maintaining high code coverage. For example, on several real-world applications and libraries (e.g., poppler, mpg123 etc.), we find real bugs (with known CVEs) in almost half of the time and upto an order of magnitude fewer inputs than state-of-the-art fuzzers.