Biblio
Machine learning (ML) classifiers are vulnerable to adversarial examples. An adversarial example is an input sample which is slightly modified to induce misclassification in an ML classifier. In this work, we investigate white-box and grey-box evasion attacks to an ML-based malware detector and conduct performance evaluations in a real-world setting. We compare the defense approaches in mitigating the attacks. We propose a framework for deploying grey-box and black-box attacks to malware detection systems.
The recent success of brain-inspired deep neural networks (DNNs) in solving complex, high-level visual tasks has led to rising expectations for their potential to match the human visual system. However, DNNs exhibit idiosyncrasies that suggest their visual representation and processing might be substantially different from human vision. One limitation of DNNs is that they are vulnerable to adversarial examples, input images on which subtle, carefully designed noises are added to fool a machine classifier. The robustness of the human visual system against adversarial examples is potentially of great importance as it could uncover a key mechanistic feature that machine vision is yet to incorporate. In this study, we compare the visual representations of white- and black-box adversarial examples in DNNs and humans by leveraging functional magnetic resonance imaging (fMRI). We find a small but significant difference in representation patterns for different (i.e. white- versus black-box) types of adversarial examples for both humans and DNNs. However, human performance on categorical judgment is not degraded by noise regardless of the type unlike DNN. These results suggest that adversarial examples may be differentially represented in the human visual system, but unable to affect the perceptual experience.
Atomic multicast is a communication primitive that delivers messages to multiple groups of processes according to some total order, with each group receiving the projection of the total order onto messages addressed to it. To be scalable, atomic multicast needs to be genuine, meaning that only the destination processes of a message should participate in ordering it. In this paper we propose a novel genuine atomic multicast protocol that in the absence of failures takes as low as 3 message delays to deliver a message when no other messages are multicast concurrently to its destination groups, and 5 message delays in the presence of concurrency. This improves the latencies of both the fault-tolerant version of classical Skeen's multicast protocol (6 or 12 message delays, depending on concurrency) and its recent improvement by Coelho et al. (4 or 8 message delays). To achieve such low latencies, we depart from the typical way of guaranteeing fault-tolerance by replicating each group with Paxos. Instead, we weave Paxos and Skeen's protocol together into a single coherent protocol, exploiting opportunities for white-box optimisations. We experimentally demonstrate that the superior theoretical characteristics of our protocol are reflected in practical performance pay-offs.
Current testing for Deep Neural Networks (DNNs) focuses on quantity of test cases but ignores diversity. To the best of our knowledge, DeepXplore is the first white-box framework for Deep Learning testing by triggering differential behaviors between multiple DNNs and increasing neuron coverage to improve diversity. Since it is based on multiple DNNs facing problems that (1) the framework is not friendly to a single DNN, (2) if incorrect predictions made by all DNNs simultaneously, DeepXplore cannot generate test cases. This paper presents Test4Deep, a white-box testing framework based on a single DNN. Test4Deep avoids mistakes of multiple DNNs by inducing inconsistencies between predicted labels of original inputs and that of generated test inputs. Meanwhile, Test4Deep improves neuron coverage to capture more diversity by attempting to activate more inactivated neurons. The proposed method was evaluated on three popular datasets with nine DNNs. Compared to DeepXplore, Test4Deep produced average 4.59% (maximum 10.49%) more test cases that all found errors and faults of DNNs. These test cases got 19.57% more diversity increment and 25.88% increment of neuron coverage. Test4Deep can further be used to improve the accuracy of DNNs by average up to 5.72% (maximum 7.0%).
Deep neural networks are susceptible to various inference attacks as they remember information about their training data. We design white-box inference attacks to perform a comprehensive privacy analysis of deep learning models. We measure the privacy leakage through parameters of fully trained models as well as the parameter updates of models during training. We design inference algorithms for both centralized and federated learning, with respect to passive and active inference attackers, and assuming different adversary prior knowledge. We evaluate our novel white-box membership inference attacks against deep learning algorithms to trace their training data records. We show that a straightforward extension of the known black-box attacks to the white-box setting (through analyzing the outputs of activation functions) is ineffective. We therefore design new algorithms tailored to the white-box setting by exploiting the privacy vulnerabilities of the stochastic gradient descent algorithm, which is the algorithm used to train deep neural networks. We investigate the reasons why deep learning models may leak information about their training data. We then show that even well-generalized models are significantly susceptible to white-box membership inference attacks, by analyzing state-of-the-art pre-trained and publicly available models for the CIFAR dataset. We also show how adversarial participants, in the federated learning setting, can successfully run active membership inference attacks against other participants, even when the global model achieves high prediction accuracies.
Network virtualization is a fundamental technology for datacenters and upcoming wireless communications (e.g., 5G). It takes advantage of software-defined networking (SDN) that provides efficient network management by converting networking fabrics into SDN-capable devices. Moreover, white-box switches, which provide flexible and fast packet processing, are broadly deployed in commercial datacenters. A white-box switch requires a specific and restricted packet processing pipeline; however, to date, there has been no SDN-based network hypervisor that can support the pipeline of white-box switches. Therefore, in this paper, we propose WhiteVisor: a network hypervisor which can support the physical network composed of white-box switches. WhiteVisor converts a flow rule from the virtual network into a packet processing pipeline compatible with the white-box switch. We implement the prototype herein and show its feasibility and effectiveness with pipeline conversion and overhead.
Wide adoption of artificial neural networks in various domains has led to an increasing interest in defending adversarial attacks against them. Preprocessing defense methods such as pixel discretization are particularly attractive in practice due to their simplicity, low computational overhead, and applicability to various systems. It is observed that such methods work well on simple datasets like MNIST, but break on more complicated ones like ImageNet under recently proposed strong white-box attacks. To understand the conditions for success and potentials for improvement, we study the pixel discretization defense method, including more sophisticated variants that take into account the properties of the dataset being discretized. Our results again show poor resistance against the strong attacks. We analyze our results in a theoretical framework and offer strong evidence that pixel discretization is unlikely to work on all but the simplest of the datasets. Furthermore, our arguments present insights why some other preprocessing defenses may be insecure.
The economic progress of the Internet of Things (IoT) is phenomenal. Applications range from checking the alignment of some components during a manufacturing process, monitoring of transportation and pedestrian levels to enhance driving and walking path, remotely observing terminally ill patients by means of medical devices such as implanted devices and infusion pumps, and so on. To provide security, encrypting the data becomes an indispensable requirement, and symmetric encryptions algorithms are becoming a crucial implementation in the resource constrained environments. Typical symmetric encryption algorithms like Advanced Encryption Standard (AES) showcases an assumption that end points of communications are secured and that the encryption key being securely stored. However, devices might be physically unprotected, and attackers may have access to the memory while the data is still encrypted. It is essential to reserve the key in such a way that an attacker finds it hard to extract it. At present, techniques like White-Box cryptography has been utilized in these circumstances. But it has been reported that applying White-Box cryptography in IoT devices have resulted in other security issues like the adversary having access to the intermediate values, and the practical implementations leading to Code lifting attacks and differential attacks. In this paper, a solution is presented to overcome these problems by demonstrating the need of White-Box Cryptography to enhance the security by utilizing the cipher block chaining (CBC) mode.
Dynamic spectrum sharing techniques applied in the UHF TV band have been developed to allow secondary WiFi transmission in areas with active TV users. This technique of dynamically controlling the exclusion zone enables vastly increasing secondary spectrum re-use, compared to the "TV white space" model where TV transmitters determine the exclusion zone and only "idle" channels can be re-purposed. However, in current such dynamic spectrum sharing systems, the sensitive operation parameters of both primary TV users (PUs) and secondary users (SUs) need to be shared with the spectrum database controller (SDC) for the purpose of realizing efficient spectrum allocation. Since such SDC server is not necessarily operated by a trusted third party, those current systems might cause essential threatens to the privacy requirement from both PUs and SUs. To address this privacy issue, this paper proposes a privacy-preserving spectrum sharing system between PUs and SUs, which realizes the spectrum allocation decision process using efficient multi-party computation (MPC) technique. In this design, the SDC only performs secure computation over encrypted input from PUs and SUs such that none of the PU or SU operation parameters will be revealed to SDC. The evaluation of its performance illustrates that our proposed system based on efficient MPC techniques can perform dynamic spectrum allocation process between PUs and SUs efficiently while preserving users' privacy.
The wide use of cloud computing and of data outsourcing rises important concerns with regards to data security resulting thus in the necessity of protection mechanisms such as encryption of sensitive data. The recent major theoretical breakthrough of finding the Holy Grail of encryption, i.e. fully homomorphic encryption guarantees the privacy of queries and their results on encrypted data. However, there are only a few studies proposing a practical performance evaluation of the use of homomorphic encryption schemes in order to perform database queries. In this paper, we propose and analyse in the context of a secure framework for a generic database query interpreter two different methods in which client requests are dynamically executed on homomorphically encrypted data. Dynamic compilation of the requests allows to take advantage of the different optimizations performed during an off-line step on an intermediate code representation, taking the form of boolean circuits, and, moreover, to specialize the execution using runtime information. Also, for the returned encrypted results, we assess the complexity and the efficiency of the different protocols proposed in the literature in terms of overall execution time, accuracy and communication overhead.
In this paper, we propose a practical and efficient word and phrase proximity searchable encryption protocols for cloud based relational databases. The proposed advanced searchable encryption protocols are provably secure. We formalize the security assurance with cryptographic security definitions and prove the security of our searchable encryption protocols under Shannon's perfect secrecy assumption. We have tested the proposed protocols comprehensively on Amazon's high performance computing server using mysql database and presented the results. The proposed protocols ensure that there is zero overhead of space and communication because cipher text size being equal to plaintext size. For the same reason, the database schema also does not change for existing applications. In this paper, we also present results of comprehensive analysis for Song, Wagner, and Perrig scheme.
Identity concealment and zero-round trip time (0-RTT) connection are two of current research focuses in the design and analysis of secure transport protocols, like TLS1.3 and Google's QUIC, in the client-server setting. In this work, we introduce a new primitive for identity-concealed authenticated encryption in the public-key setting, referred to as higncryption, which can be viewed as a novel monolithic integration of public-key encryption, digital signature, and identity concealment. We then present the security definitional framework for higncryption, and a conceptually simple (yet carefully designed) protocol construction. As a new primitive, higncryption can have many applications. In this work, we focus on its applications to 0-RTT authentication, showing higncryption is well suitable to and compatible with QUIC and OPTLS, and on its applications to identity-concealed authenticated key exchange (CAKE) and unilateral CAKE (UCAKE). Of independent interest is a new concise security definitional framework for CAKE and UCAKE proposed in this work, which unifies the traditional BR and (post-ID) frameworks, enjoys composability, and ensures very strong security guarantee. Along the way, we make a systematically comparative study with related protocols and mechanisms including Zheng's signcryption, one-pass HMQV, QUIC, TLS1.3 and OPTLS, most of which are widely standardized or in use.
Untrusted third-parties are found throughout the integrated circuit (IC) design flow resulting in potential threats in IC reliability and security. Threats include IC counterfeiting, intellectual property (IP) theft, IC overproduction, and the insertion of hardware Trojans. Logic encryption has emerged as a method of enhancing security against such threats, however, current implementations of logic encryption, including the XOR or look-up table (LUT) techniques, have high per-gate overheads in area, performance, and power. A novel gate level logic encryption technique with reduced per-gate overheads is described in this paper. In addition, a technique to expand the search space of a key sequence is provided, increasing the difficulty for an adversary to extract the key value. A power reduction of 41.50%, an estimated area reduction of 43.58%, and a performance increase of 34.54% is achieved when using the proposed gate level logic encryption instead of the LUT based technique for an encrypted AND gate.
Logistic regression is a powerful machine learning tool to classify data. When dealing with sensitive data such as private or medical information, cares are necessary. In this paper, we propose a secure system for protecting the training data in logistic regression via homomorphic encryption. Perhaps surprisingly, despite the non-polynomial tasks of training in logistic regression, we show that only additively homomorphic encryption is needed to build our system. Our system is secure and scalable with the dataset size.
Code clone detection is an important problem for software maintenance and evolution. Many approaches consider either structure or identifiers, but none of the existing detection techniques model both sources of information. These techniques also depend on generic, handcrafted features to represent code fragments. We introduce learning-based detection techniques where everything for representing terms and fragments in source code is mined from the repository. Our code analysis supports a framework, which relies on deep learning, for automatically linking patterns mined at the lexical level with patterns mined at the syntactic level. We evaluated our novel learning-based approach for code clone detection with respect to feasibility from the point of view of software maintainers. We sampled and manually evaluated 398 file- and 480 method-level pairs across eight real-world Java systems; 93% of the file- and method-level samples were evaluated to be true positives. Among the true positives, we found pairs mapping to all four clone types. We compared our approach to a traditional structure-oriented technique and found that our learning-based approach detected clones that were either undetected or suboptimally reported by the prominent tool Deckard. Our results affirm that our learning-based approach is suitable for clone detection and a tenable technique for researchers.
Transformations form an important part of developing domain specific languages, where they are used to provide semantics for typing and evaluation. Yet, few solutions exist for verifying transformations written in expressive high-level transformation languages. We take a step towards that goal, by developing a general symbolic execution technique that handles programs written in these high-level transformation languages. We use logical constraints to describe structured symbolic values, including containment, acyclicity, simple unordered collections (sets) and to handle deep type-based querying of syntax hierarchies. We evaluate this symbolic execution technique on a collection of refactoring and model transformation programs, showing that the white-box test generation tool based on symbolic execution obtains better code coverage than a black box test generator for such programs in almost all tested cases.
Abstractions make building complex systems possible. Many facilities provided by a modern programming language are directly designed to build a certain style of abstraction. Abstractions also aim to enhance code reusability, thus enhancing programmer productivity and effectiveness. Real-world software systems can grow to have a complicated hierarchy of abstractions. Often, the hierarchy grows unnecessarily deep, because the programmers have envisioned the most generic use cases for a piece of code to make it reusable. Sometimes, the abstractions used in the program are not the appropriate ones, and it would be simpler for the higher level client to circumvent such abstractions. Another problem is the impedance mismatch between different pieces of code or libraries coming from different projects that are not designed to work together. Interoperability between such libraries are often hindered by abstractions, by design, in the name of hiding implementation details and encapsulation. These problems necessitate forms of abstraction that are easy to manipulate if needed. In this paper, we describe a powerful mechanism to create white-box abstractions, that encourage flatter hierarchies of abstraction and ease of manipulation and customization when necessary: program refinement. In so doing, we rely on the basic principle that writing directly in the host programming language is as least restrictive as one can get in terms of expressiveness, and allow the programmer to reuse and customize existing code snippets to address their specific needs.