Biblio
Malware is any software that causes harm to the user information, computer systems or network. Modern computing and internet systems are facing increase in malware threats from the internet. It is observed that different malware follows the same patterns in their structure with minimal alterations. The type of threats has evolved, from file-based malware to fileless malware, such kind of threats are also known as Advance Volatile Threat (AVT). Fileless malware is complex and evasive, exploiting pre-installed trusted programs to infiltrate information with its malicious intent. Fileless malware is designed to run in system memory with a very small footprint, leaving no artifacts on physical hard drives. Traditional antivirus signatures and heuristic analysis are unable to detect this kind of malware due to its sophisticated and evasive nature. This paper provides information relating to detection, mitigation and analysis for such kind of threat.
This paper argues about a new conceptual modeling language for the White-Box (WB) security analysis. In the WB security domain, an attacker may have access to the inner structure of an application or even the entire binary code. It becomes pretty easy for attackers to inspect, reverse engineer, and tamper the application with the information they steal. The basis of this paper is the 14 patterns developed by a leading provider of software protection technologies and solutions. We provide a part of a new modeling language named i-WBS (White-Box Security) to describe problems of WB security better. The essence of White-Box security problem is code security. We made the new modeling language focus on code more than ever before. In this way, developers who are not security experts can easily understand what they need to really protect.
It is now possible to synthesize highly realistic images of people who do not exist. Such content has, for example, been implicated in the creation of fraudulent socialmedia profiles responsible for dis-information campaigns. Significant efforts are, therefore, being deployed to detect synthetically-generated content. One popular forensic approach trains a neural network to distinguish real from synthetic content.We show that such forensic classifiers are vulnerable to a range of attacks that reduce the classifier to near- 0% accuracy. We develop five attack case studies on a state- of-the-art classifier that achieves an area under the ROC curve (AUC) of 0.95 on almost all existing image generators, when only trained on one generator. With full access to the classifier, we can flip the lowest bit of each pixel in an image to reduce the classifier's AUC to 0.0005; perturb 1% of the image area to reduce the classifier's AUC to 0.08; or add a single noise pattern in the synthesizer's latent space to reduce the classifier's AUC to 0.17. We also develop a black-box attack that, with no access to the target classifier, reduces the AUC to 0.22. These attacks reveal significant vulnerabilities of certain image-forensic classifiers.
To ensure quality of service and user experience, large Internet companies often monitor various Key Performance Indicators (KPIs) of their systems so that they can detect anomalies and identify failure in real time. However, due to a large number of various KPIs and the lack of high-quality labels, existing KPI anomaly detection approaches either perform well only on certain types of KPIs or consume excessive resources. Therefore, to realize generic and practical KPI anomaly detection in the real world, we propose a KPI anomaly detection framework named iRRCF-Active, which contains an unsupervised and white-box anomaly detector based on Robust Random Cut Forest (RRCF), and an active learning component. Specifically, we novelly propose an improved RRCF (iRRCF) algorithm to overcome the drawbacks of applying original RRCF in KPI anomaly detection. Besides, we also incorporate the idea of active learning to make our model benefit from high-quality labels given by experienced operators. We conduct extensive experiments on a large-scale public dataset and a private dataset collected from a large commercial bank. The experimental resulta demonstrate that iRRCF-Active performs better than existing traditional statistical methods, unsupervised learning methods and supervised learning methods. Besides, each component in iRRCF-Active has also been demonstrated to be effective and indispensable.
Cloud-based payments, virtual car keys, and digital rights management are examples of consumer electronics applications that use secure software. White-box implementations of the Advanced Encryption Standard (AES) are important building blocks of secure software systems, and the attack of Billet, Gilbert, and Ech-Chatbi (BGE) is a well-known attack on such implementations. A drawback from the adversary’s or security tester’s perspective is that manual reverse engineering of the implementation is required before the BGE attack can be applied. This paper presents a method to automate the BGE attack on a class of white-box AES implementations with a specific type of external encoding. The new method was implemented and applied successfully to a CHES 2016 capture the flag challenge.
We leverage deep learning algorithms on various user behavioral information gathered from end-user devices to classify a subject of interest. In spite of the ability of these techniques to counter spoofing threats, they are vulnerable to adversarial learning attacks, where an attacker adds adversarial noise to the input samples to fool the classifier into false acceptance. Recently, a handful of mature techniques like Fast Gradient Sign Method (FGSM) have been proposed to aid white-box attacks, where an attacker has a complete knowledge of the machine learning model. On the contrary, we exploit a black-box attack to a behavioral biometric system based on gait patterns, by using FGSM and training a shadow model that mimics the target system. The attacker has limited knowledge on the target model and no knowledge of the real user being authenticated, but induces a false acceptance in authentication. Our goal is to understand the feasibility of a black-box attack and to what extent FGSM on shadow models would contribute to its success. Our results manifest that the performance of FGSM highly depends on the quality of the shadow model, which is in turn impacted by key factors including the number of queries allowed by the target system in order to train the shadow model. Our experimentation results have revealed strong relationships between the shadow model and FGSM performance, as well as the effect of the number of FGSM iterations used to create an attack instance. These insights also shed light on deep-learning algorithms' model shareability that can be exploited to launch a successful attack.
Information Flow Control (IFC) is a collection of techniques for ensuring a no-write-down no-read-up style security policy known as noninterference. Traditional methods for both static (e.g. type systems) and dynamic (e.g. runtime monitors) IFC suffer from untenable numbers of false alarms on real-world programs. Secure Multi-Execution (SME) promises to provide secure information flow control without modifying the behaviour of already secure programs, a property commonly referred to as transparency. Implementations of SME exist for the web in the form of the FlowFox browser and as plug-ins to several programming languages. Furthermore, SME can in theory work in a black-box manner, meaning that it can be programming language agnostic, making it perfect for securing legacy or third-party systems. As such SME, and its variants like Multiple Facets (MF) and Faceted Secure Multi-Execution (FSME), appear to be a family of panaceas for the security engineer. The question is, how come, given all these advantages, that these techniques are not ubiquitous in practice? The answer lies, partially, in the issue of runtime and memory overhead. SME and its variants are prohibitively expensive to deploy in many non-trivial situations. The natural question is why is this the case? On the surface, the reason is simple. The techniques in the SME family all rely on the idea of multi-execution, running all or parts of a program multiple times to achieve noninterference. Naturally, this causes some overhead. However, the predominant thinking in the IFC community has been that these overheads can be overcome. In this paper we argue that there are fundamental reasons to expect this not to be the case and prove two key theorems: (1) All transparent enforcement is polynomial time equivalent to multi-execution. (2) All black-box enforcement takes time exponential in the number of principals in the security lattice. Our methods also allow us to answer, in the affirmative, an open question about the possibility of secure and transparent enforcement of a security condition known as Termination Insensitive Noninterference.
Advances in technology have led not only to increased security and privacy but also to new channels of information leakage. New leak channels have resulted in the emergence of increased relevance of various types of attacks. One such attacks are Side-Channel Attacks, i.e. attacks aimed to find vulnerabilities in the practical component of the algorithm. However, with the development of these types of attacks, methods of protection against them have also appeared. One of such methods is White-Box Cryptography.
In Machine Learning, White Box Adversarial Attacks rely on knowing underlying knowledge about the model attributes. This works focuses on discovering to distrinct pieces of model information: the underlying architecture and primary training dataset. With the process in this paper, a structured set of input probes and the output of the model become the training data for a deep classifier. Two subdomains in Machine Learning are explored - image based classifiers and text transformers with GPT-2. With image classification, the focus is on exploring commonly deployed architectures and datasets available in popular public libraries. Using a single transformer architecture with multiple levels of parameters, text generation is explored by fine tuning off different datasets. Each dataset explored in image and text are distinguishable from one another. Diversity in text transformer outputs implies further research is needed to successfully classify architecture attribution in text domain.
We consider the problem of protecting cloud services from simultaneous white-box and black-box attacks. Recent research in cryptographic program obfuscation considers the problem of protecting the confidentiality of programs and any secrets in them. In this model, a provable program obfuscation solution makes white-box attacks to the program not more useful than black-box attacks. Motivated by very recent results showing successful black-box attacks to machine learning programs run by cloud servers, we propose and study the approach of augmenting the program obfuscation solution model so to achieve, in at least some class of application scenarios, program confidentiality in the presence of both white-box and black-box attacks.We propose and formally define encrypted-input program obfuscation, where a key is shared between the entity obfuscating the program and the entity encrypting the program's inputs. We believe this model might be of interest in practical scenarios where cloud programs operate over encrypted data received by associated sensors (e.g., Internet of Things, Smart Grid).Under standard intractability assumptions, we show various results that are not known in the traditional cryptographic program obfuscation model; most notably: Yao's garbled circuit technique implies encrypted-input program obfuscation hiding all gates of an arbitrary polynomial circuit; and very efficient encrypted-input program obfuscation for range membership programs and a class of machine learning programs (i.e., decision trees). The performance of the latter solutions has only a small constant overhead over the equivalent unobfuscated program.
Cybersecurity community is slowly leveraging Machine Learning (ML) to combat ever evolving threats. One of the biggest drivers for successful adoption of these models is how well domain experts and users are able to understand and trust their functionality. As these black-box models are being employed to make important predictions, the demand for transparency and explainability is increasing from the stakeholders.Explanations supporting the output of ML models are crucial in cyber security, where experts require far more information from the model than a simple binary output for their analysis. Recent approaches in the literature have focused on three different areas: (a) creating and improving explainability methods which help users better understand the internal workings of ML models and their outputs; (b) attacks on interpreters in white box setting; (c) defining the exact properties and metrics of the explanations generated by models. However, they have not covered, the security properties and threat models relevant to cybersecurity domain, and attacks on explainable models in black box settings.In this paper, we bridge this gap by proposing a taxonomy for Explainable Artificial Intelligence (XAI) methods, covering various security properties and threat models relevant to cyber security domain. We design a novel black box attack for analyzing the consistency, correctness and confidence security properties of gradient based XAI methods. We validate our proposed system on 3 security-relevant data-sets and models, and demonstrate that the method achieves attacker's goal of misleading both the classifier and explanation report and, only explainability method without affecting the classifier output. Our evaluation of the proposed approach shows promising results and can help in designing secure and robust XAI methods.
The wide use of cloud computing and of data outsourcing rises important concerns with regards to data security resulting thus in the necessity of protection mechanisms such as encryption of sensitive data. The recent major theoretical breakthrough of finding the Holy Grail of encryption, i.e. fully homomorphic encryption guarantees the privacy of queries and their results on encrypted data. However, there are only a few studies proposing a practical performance evaluation of the use of homomorphic encryption schemes in order to perform database queries. In this paper, we propose and analyse in the context of a secure framework for a generic database query interpreter two different methods in which client requests are dynamically executed on homomorphically encrypted data. Dynamic compilation of the requests allows to take advantage of the different optimizations performed during an off-line step on an intermediate code representation, taking the form of boolean circuits, and, moreover, to specialize the execution using runtime information. Also, for the returned encrypted results, we assess the complexity and the efficiency of the different protocols proposed in the literature in terms of overall execution time, accuracy and communication overhead.
In this paper, we propose a practical and efficient word and phrase proximity searchable encryption protocols for cloud based relational databases. The proposed advanced searchable encryption protocols are provably secure. We formalize the security assurance with cryptographic security definitions and prove the security of our searchable encryption protocols under Shannon's perfect secrecy assumption. We have tested the proposed protocols comprehensively on Amazon's high performance computing server using mysql database and presented the results. The proposed protocols ensure that there is zero overhead of space and communication because cipher text size being equal to plaintext size. For the same reason, the database schema also does not change for existing applications. In this paper, we also present results of comprehensive analysis for Song, Wagner, and Perrig scheme.
Identity concealment and zero-round trip time (0-RTT) connection are two of current research focuses in the design and analysis of secure transport protocols, like TLS1.3 and Google's QUIC, in the client-server setting. In this work, we introduce a new primitive for identity-concealed authenticated encryption in the public-key setting, referred to as higncryption, which can be viewed as a novel monolithic integration of public-key encryption, digital signature, and identity concealment. We then present the security definitional framework for higncryption, and a conceptually simple (yet carefully designed) protocol construction. As a new primitive, higncryption can have many applications. In this work, we focus on its applications to 0-RTT authentication, showing higncryption is well suitable to and compatible with QUIC and OPTLS, and on its applications to identity-concealed authenticated key exchange (CAKE) and unilateral CAKE (UCAKE). Of independent interest is a new concise security definitional framework for CAKE and UCAKE proposed in this work, which unifies the traditional BR and (post-ID) frameworks, enjoys composability, and ensures very strong security guarantee. Along the way, we make a systematically comparative study with related protocols and mechanisms including Zheng's signcryption, one-pass HMQV, QUIC, TLS1.3 and OPTLS, most of which are widely standardized or in use.
Untrusted third-parties are found throughout the integrated circuit (IC) design flow resulting in potential threats in IC reliability and security. Threats include IC counterfeiting, intellectual property (IP) theft, IC overproduction, and the insertion of hardware Trojans. Logic encryption has emerged as a method of enhancing security against such threats, however, current implementations of logic encryption, including the XOR or look-up table (LUT) techniques, have high per-gate overheads in area, performance, and power. A novel gate level logic encryption technique with reduced per-gate overheads is described in this paper. In addition, a technique to expand the search space of a key sequence is provided, increasing the difficulty for an adversary to extract the key value. A power reduction of 41.50%, an estimated area reduction of 43.58%, and a performance increase of 34.54% is achieved when using the proposed gate level logic encryption instead of the LUT based technique for an encrypted AND gate.
Logistic regression is a powerful machine learning tool to classify data. When dealing with sensitive data such as private or medical information, cares are necessary. In this paper, we propose a secure system for protecting the training data in logistic regression via homomorphic encryption. Perhaps surprisingly, despite the non-polynomial tasks of training in logistic regression, we show that only additively homomorphic encryption is needed to build our system. Our system is secure and scalable with the dataset size.
Code clone detection is an important problem for software maintenance and evolution. Many approaches consider either structure or identifiers, but none of the existing detection techniques model both sources of information. These techniques also depend on generic, handcrafted features to represent code fragments. We introduce learning-based detection techniques where everything for representing terms and fragments in source code is mined from the repository. Our code analysis supports a framework, which relies on deep learning, for automatically linking patterns mined at the lexical level with patterns mined at the syntactic level. We evaluated our novel learning-based approach for code clone detection with respect to feasibility from the point of view of software maintainers. We sampled and manually evaluated 398 file- and 480 method-level pairs across eight real-world Java systems; 93% of the file- and method-level samples were evaluated to be true positives. Among the true positives, we found pairs mapping to all four clone types. We compared our approach to a traditional structure-oriented technique and found that our learning-based approach detected clones that were either undetected or suboptimally reported by the prominent tool Deckard. Our results affirm that our learning-based approach is suitable for clone detection and a tenable technique for researchers.
Transformations form an important part of developing domain specific languages, where they are used to provide semantics for typing and evaluation. Yet, few solutions exist for verifying transformations written in expressive high-level transformation languages. We take a step towards that goal, by developing a general symbolic execution technique that handles programs written in these high-level transformation languages. We use logical constraints to describe structured symbolic values, including containment, acyclicity, simple unordered collections (sets) and to handle deep type-based querying of syntax hierarchies. We evaluate this symbolic execution technique on a collection of refactoring and model transformation programs, showing that the white-box test generation tool based on symbolic execution obtains better code coverage than a black box test generator for such programs in almost all tested cases.
Abstractions make building complex systems possible. Many facilities provided by a modern programming language are directly designed to build a certain style of abstraction. Abstractions also aim to enhance code reusability, thus enhancing programmer productivity and effectiveness. Real-world software systems can grow to have a complicated hierarchy of abstractions. Often, the hierarchy grows unnecessarily deep, because the programmers have envisioned the most generic use cases for a piece of code to make it reusable. Sometimes, the abstractions used in the program are not the appropriate ones, and it would be simpler for the higher level client to circumvent such abstractions. Another problem is the impedance mismatch between different pieces of code or libraries coming from different projects that are not designed to work together. Interoperability between such libraries are often hindered by abstractions, by design, in the name of hiding implementation details and encapsulation. These problems necessitate forms of abstraction that are easy to manipulate if needed. In this paper, we describe a powerful mechanism to create white-box abstractions, that encourage flatter hierarchies of abstraction and ease of manipulation and customization when necessary: program refinement. In so doing, we rely on the basic principle that writing directly in the host programming language is as least restrictive as one can get in terms of expressiveness, and allow the programmer to reuse and customize existing code snippets to address their specific needs.
Embedded devices with constrained computational resources, such as wireless sensor network nodes, electronic tag readers, roadside units in vehicular networks, and smart watches and wristbands, are widely used in the Internet of Things. Many of such devices are deployed in untrustable environments, and others may be easy to lose, leading to possible capture by adversaries. Accordingly, in the context of security research, these devices are running in the white-box attack context, where the adversary may have total visibility of the implementation of the built-in cryptosystem with full control over its execution. It is undoubtedly a significant challenge to deal with attacks from a powerful adversary in white-box attack contexts. Existing encryption algorithms for white-box attack contexts typically require large memory use, varying from one to dozens of megabytes, and thus are not suitable for resource-constrained devices. As a countermeasure in such circumstances, we propose an ultra-lightweight encryption scheme for protecting the confidentiality of data in white-box attack contexts. The encryption is executed with secret components specialized for resource-constrained devices against white-box attacks, and the encryption algorithm requires a relatively small amount of static data, ranging from 48 to 92 KB. The security and efficiency of the proposed scheme have been theoretically analyzed with positive results, and experimental evaluations have indicated that the scheme satisfies the resource constraints in terms of limited memory use and low computational cost.
We give attacks on Feistel-based format-preserving encryption (FPE) schemes that succeed in message recovery (not merely distinguishing scheme outputs from random) when the message space is small. For \$4\$-bit messages, the attacks fully recover the target message using \$2textasciicircum1 examples for the FF3 NIST standard and \$2textasciicircum5 examples for the FF1 NIST standard. The examples include only three messages per tweak, which is what makes the attacks non-trivial even though the total number of examples exceeds the size of the domain. The attacks are rigorously analyzed in a new definitional framework of message-recovery security. The attacks are easily put out of reach by increasing the number of Feistel rounds in the standards.