Hough, Katherine, Welearegai, Gebrehiwet, Hammer, Christian, Bell, Jonathan.
2020.
Revealing Injection Vulnerabilities by Leveraging Existing Tests. 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). :284–296.
Code injection attacks, like the one used in the high-profile 2017 Equifax breach, have become increasingly common, now ranking \#1 on OWASP's list of critical web application vulnerabilities. Static analyses for detecting these vulnerabilities can overwhelm developers with false positive reports. Meanwhile, most dynamic analyses rely on detecting vulnerabilities as they occur in the field, which can introduce a high performance overhead in production code. This paper describes a new approach for detecting injection vulnerabilities in applications by harnessing the combined power of human developers' test suites and automated dynamic analysis. Our new approach, Rivulet, monitors the execution of developer-written functional tests in order to detect information flows that may be vulnerable to attack. Then, Rivulet uses a white-box test generation technique to repurpose those functional tests to check if any vulnerable flow could be exploited. When applied to the version of Apache Struts exploited in the 2017 Equifax attack, Rivulet quickly identifies the vulnerability, leveraging only the tests that existed in Struts at that time. We compared Rivulet to the state-of-the-art static vulnerability detector Julia on benchmarks, finding that Rivulet outperformed Julia in both false positives and false negatives. We also used Rivulet to detect new vulnerabilities.
Naeem, Hajra, Alalfi, Manar H..
2020.
Identifying Vulnerable IoT Applications Using Deep Learning. 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). :582–586.
This paper presents an approach for the identification of vulnerable IoT applications using deep learning algorithms. The approach focuses on a category of vulnerabilities that leads to sensitive information leakage which can be identified using taint flow analysis. First, we analyze the source code of IoT apps in order to recover tokens along their frequencies and tainted flows. Second, we develop, Token2Vec, which transforms the source code tokens into vectors. We have also developed Flow2Vec, which transforms the identified tainted flows into vectors. Third, we use the recovered vectors to train a deep learning algorithm to build a model for the identification of tainted apps. We have evaluated the approach on two datasets and the experiments show that the proposed approach of combining tainted flows features with the base benchmark that uses token frequencies only, has improved the accuracy of the prediction models from 77.78% to 92.59% for Corpus1 and 61.11% to 87.03% for Corpus2.
Sapountzis, Nikolaos, Sun, Ruimin, Wei, Xuetao, Jin, Yier, Crandall, Jedidiah, Oliveira, Daniela.
2020.
MITOS: Optimal Decisioning for the Indirect Flow Propagation Dilemma in Dynamic Information Flow Tracking Systems. 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS). :1090–1100.
Dynamic Information Flow Tracking (DIFT), also called Dynamic Taint Analysis (DTA), is a technique for tracking the information as it flows through a program's execution. Specifically, some inputs or data get tainted and then these taint marks (tags) propagate usually at the instruction-level. While DIFT has been a fundamental concept in computer and network security for the past decade, it still faces open challenges that impede its widespread application in practice; one of them being the indirect flow propagation dilemma: should the tags involved in an indirect flow, e.g., in a control or address dependency, be propagated? Propagating all these tags, as is done for direct flows, leads to overtainting (all taintable objects become tainted), while not propagating them leads to undertainting (information flow becomes incomplete). In this paper, we analytically model that decisioning problem for indirect flows, by considering various tradeoffs including undertainting versus overtainting, importance of heterogeneous code semantics and context. Towards tackling this problem, we design MITOS, a distributed-optimization algorithm, that: decides about the propagation of indirect flows by properly weighting all these tradeoffs, is of low-complexity, is scalable, is able to flexibly adapt to different application scenarios and security needs of large distributed systems. Additionally, MITOS is applicable to most DIFT systems that consider an arbitrary number of tag types, and introduces the key properties of fairness and tag-balancing to the DIFT field. To demonstrate MITOS's applicability in practice, we implement and evaluate MITOS on top of an open-source DIFT, and we shed light on the open problem. We also perform a case-study scenario with a real in-memory only attack and show that MITOS improves simultaneously (i) system's spatiotemporal overhead (up to 40%), and (ii) system's fingerprint on suspected bytes (up to 167%) compared to traditional DIFT, even though these metrics usually conflict.
Hermerschmidt, Lars, Straub, Andreas, Piskachev, Goran.
2020.
Language-Agnostic Injection Detection. 2020 IEEE Security and Privacy Workshops (SPW). :268–275.
Formal languages are ubiquitous wherever software systems need to exchange or store data. Unparsing into and parsing from such languages is an error-prone process that has spawned an entire class of security vulnerabilities. There has been ample research into finding vulnerabilities on the parser side, but outside of language specific approaches, few techniques targeting unparser vulnerabilities exist. This work presents a language-agnostic approach for spotting injection vulnerabilities in unparsers. It achieves this by mining unparse trees using dynamic taint analysis to extract language keywords, which are leveraged for guided fuzzing. Vulnerabilities can thus be found without requiring prior knowledge about the formal language, and in fact, the approach is even applicable where no specification thereof exists at all. This empowers security researchers and developers alike to gain deeper understanding of unparser implementations through examination of the unparse trees generated by the approach, as well as enabling them to find new vulnerabilities in poorly-understood software. This work presents a language-agnostic approach for spotting injection vulnerabilities in unparsers. It achieves this by mining unparse trees using dynamic taint analysis to extract language keywords, which are leveraged for guided fuzzing. Vulnerabilities can thus be found without requiring prior knowledge about the formal language, and in fact, the approach is even applicable where no specification thereof exists at all. This empowers security researchers and developers alike to gain deeper understanding of unparser implementations through examination of the unparse trees generated by the approach, as well as enabling them to find new vulnerabilities in poorly-understood software.
Lyons, D., Zahra, S..
2020.
Using Taint Analysis and Reinforcement Learning (TARL) to Repair Autonomous Robot Software. 2020 IEEE Security and Privacy Workshops (SPW). :181–184.
It is important to be able to establish formal performance bounds for autonomous systems. However, formal verification techniques require a model of the environment in which the system operates; a challenge for autonomous systems, especially those expected to operate over longer timescales. This paper describes work in progress to automate the monitor and repair of ROS-based autonomous robot software written for an apriori partially known and possibly incorrect environment model. A taint analysis method is used to automatically extract the dataflow sequence from input topic to publish topic, and instrument that code. A unique reinforcement learning approximation of MDP utility is calculated, an empirical and non-invasive characterization of the inherent objectives of the software designers. By comparing design (a-priori) utility with deploy (deployed system) utility, we show, using a small but real ROS example, that it's possible to monitor a performance criterion and relate violations of the criterion to parts of the software. The software is then patched using automated software repair techniques and evaluated against the original off-line utility.
Andarzian, Seyed Behnam, Ladani, Behrouz Tork.
2020.
Compositional Taint Analysis of Native Codes for Security Vetting of Android Applications. 2020 10th International Conference on Computer and Knowledge Engineering (ICCKE). :567–572.
Security vetting of Android applications is one of the crucial aspects of the Android ecosystem. Regarding the state of the art tools for this goal, most of them doesn't consider analyzing native codes and only analyze the Java code. However, Android concedes its developers to implement a part or all of their applications using C or C++ code. Thus, applying conservative manners for analyzing Android applications while ignoring native codes would lead to less precision in results. Few works have tried to analyze Android native codes, but only JN-SAF has applied taint analysis using static techniques such as symbolic execution. However, symbolic execution has some problems when is used in large programs. One of these problems is the exponential growth of program paths that would raise the path explosion issue. In this work, we have tried to alleviate this issue by introducing our new tool named CTAN. CTAN applies new symbolic execution methods to angr in a particular way that it can make JN-SAF more efficient and faster. We have introduced compositional taint analysis in CTAN by combining satisfiability modulo theories with symbolic execution. Our experiments show that CTAN is 26 percent faster than its previous work JN-SAF and it also leads to more precision by detecting more data-leakage in large Android native codes.
Fu, Xiaoqin, Cai, Haipeng.
2020.
Scaling Application-Level Dynamic Taint Analysis to Enterprise-Scale Distributed Systems. 2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). :270–271.
With the increasing deployment of enterprise-scale distributed systems, effective and practical defenses for such systems against various security vulnerabilities such as sensitive data leaks are urgently needed. However, most existing solutions are limited to centralized programs. For real-world distributed systems which are of large scales, current solutions commonly face one or more of scalability, applicability, and portability challenges. To overcome these challenges, we develop a novel dynamic taint analysis for enterprise-scale distributed systems. To achieve scalability, we use a multi-phase analysis strategy to reduce the overall cost. We infer implicit dependencies via partial-ordering method events in distributed programs to address the applicability challenge. To achieve greater portability, the analysis is designed to work at an application level without customizing platforms. Empirical results have shown promising scalability and capabilities of our approach.
She, Dongdong, Chen, Yizheng, Shah, Abhishek, Ray, Baishakhi, Jana, Suman.
2020.
Neutaint: Efficient Dynamic Taint Analysis with Neural Networks. 2020 IEEE Symposium on Security and Privacy (SP). :1527–1543.
Dynamic taint analysis (DTA) is widely used by various applications to track information flow during runtime execution. Existing DTA techniques use rule-based taint-propagation, which is neither accurate (i.e., high false positive rate) nor efficient (i.e., large runtime overhead). It is hard to specify taint rules for each operation while covering all corner cases correctly. Moreover, the overtaint and undertaint errors can accumulate during the propagation of taint information across multiple operations. Finally, rule-based propagation requires each operation to be inspected before applying the appropriate rules resulting in prohibitive performance overhead on large real-world applications.In this work, we propose Neutaint, a novel end-to-end approach to track information flow using neural program embeddings. The neural program embeddings model the target's programs computations taking place between taint sources and sinks, which automatically learns the information flow by observing a diverse set of execution traces. To perform lightweight and precise information flow analysis, we utilize saliency maps to reason about most influential sources for different sinks. Neutaint constructs two saliency maps, a popular machine learning approach to influence analysis, to summarize both coarse-grained and fine-grained information flow in the neural program embeddings.We compare Neutaint with 3 state-of-the-art dynamic taint analysis tools. The evaluation results show that Neutaint can achieve 68% accuracy, on average, which is 10% improvement while reducing 40× runtime overhead over the second-best taint tool Libdft on 6 real world programs. Neutaint also achieves 61% more edge coverage when used for taint-guided fuzzing indicating the effectiveness of the identified influential bytes. We also evaluate Neutaint's ability to detect real world software attacks. The results show that Neutaint can successfully detect different types of vulnerabilities including buffer/heap/integer overflows, division by zero, etc. Lastly, Neutaint can detect 98.7% of total flows, the highest among all taint analysis tools.