Biblio
Guaranteeing a certain level of user privacy in an arbitrary piece of text is a challenging issue. However, with this challenge comes the potential of unlocking access to vast data stores for training machine learning models and supporting data driven decisions. We address this problem through the lens of dx-privacy, a generalization of Differential Privacy to non Hamming distance metrics. In this work, we explore word representations in Hyperbolic space as a means of preserving privacy in text. We provide a proof satisfying dx-privacy, then we define a probability distribution in Hyperbolic space and describe a way to sample from it in high dimensions. Privacy is provided by perturbing vector representations of words in high dimensional Hyperbolic space to obtain a semantic generalization. We conduct a series of experiments to demonstrate the tradeoff between privacy and utility. Our privacy experiments illustrate protections against an authorship attribution algorithm while our utility experiments highlight the minimal impact of our perturbations on several downstream machine learning models. Compared to the Euclidean baseline, we observe \textbackslashtextgreater 20x greater guarantees on expected privacy against comparable worst case statistics.
This paper presents a machine learning classifier designed to identify SQL injection vulnerabilities in PHP code. Both classical and deep learning based machine learning algorithms were used to train and evaluate classifier models using input validation and sanitization features extracted from source code files. On ten-fold cross validations a model trained using Convolutional Neural Network(CNN) achieved the highest precision (95.4%), while a model based on Multilayer Perceptron(MLP) achieved the highest recall (63.7%) and the highest f-measure (0.746).
Malware scanning of an app market is expected to be scalable and effective. However, existing approaches use either syntax-based features which can be evaded by transformation attacks or semantic-based features which are usually extracted by performing expensive program analysis. Therefor, in this paper, we propose a lightweight graph-based approach to perform Android malware detection. Instead of traditional heavyweight static analysis, we treat function call graphs of apps as social networks and perform social-network-based centrality analysis to represent the semantic features of the graphs. Our key insight is that centrality provides a succinct and fault-tolerant representation of graph semantics, especially for graphs with certain amount of inaccurate information (e.g., inaccurate call graphs). We implement a prototype system, MalScan, and evaluate it on datasets of 15,285 benign samples and 15,430 malicious samples. Experimental results show that MalScan is capable of detecting Android malware with up to 98% accuracy under one second which is more than 100 times faster than two state-of-the-art approaches, namely MaMaDroid and Drebin. We also demonstrate the feasibility of MalScan on market-wide malware scanning by performing a statistical study on over 3 million apps. Finally, in a corpus of dataset collected from Google-Play app market, MalScan is able to identify 18 zero-day malware including malware samples that can evade detection of existing tools.
Cryptojacking (also called malicious cryptocurrency mining or cryptomining) is a new threat model using CPU resources covertly “mining” a cryptocurrency in the browser. The impact is a surge in CPU Usage and slows the system performance. In this research, in-browsercryptojacking mitigation has been built as an extension in Google Chrome using Taint analysis method. The method used in this research is attack modeling with abuse case using the Man-In-The-Middle (MITM) attack as a testing for mitigation. The proposed model is designed so that users will be notified if a cryptojacking attack occurs. Hence, the user is able to check the script characteristics that run on the website background. The results of this research show that the taint analysis is a promising method to mitigate cryptojacking attacks. From 100 random sample websites, the taint analysis method can detect 19 websites that are infcted by cryptojacking.
This paper reviews the definitions and characteristics of military effects, the Internet of Battlefield Things (IoBT), and their impact on decision processes in a Multi-Domain Operating environment (MDO). The aspects of contemporary military decision-processes are illustrated and an MDO Effect Loop decision process is introduced. We examine the concept of IoBT effects and their implications in MDO. These implications suggest that when considering the concept of MDO, as a doctrine, the technological advances of IoBTs empower enhancements in decision frameworks and increase the viability of novel operational approaches and options for military effects.
Machine Learning as a Service (MLaaS) is becoming a popular practice where Service Consumers, e.g., end-users, send their data to a ML Service and receive the prediction outputs. However, the emerging usage of MLaaS has raised severe privacy concerns about users' proprietary data. PrivacyPreserving Machine Learning (PPML) techniques aim to incorporate cryptographic primitives such as Homomorphic Encryption (HE) and Multi-Party Computation (MPC) into ML services to address privacy concerns from a technology standpoint. Existing PPML solutions have not been widely adopted in practice due to their assumed high overhead and integration difficulty within various ML front-end frameworks as well as hardware backends. In this work, we propose PlaidML-HE, the first end-toend HE compiler for PPML inference. Leveraging the capability of Domain-Specific Languages, PlaidML-HE enables automated generation of HE kernels across diverse types of devices. We evaluate the performance of PlaidML-HE on different ML kernels and demonstrate that PlaidML-HE greatly reduces the overhead of the HE primitive compared to the existing implementations.
In this paper, RBF-based multistage auto-encoders are used to detect IDS attacks. RBF has numerous applications in various actual life settings. The planned technique involves a two-part multistage auto-encoder and RBF. The multistage auto-encoder is applied to select top and sensitive features from input data. The selected features from the multistage auto-encoder is wired as input to the RBF and the RBF is trained to categorize the input data into two labels: attack or no attack. The experiment was realized using MATLAB2018 on a dataset comprising 175,341 case, each of which involves 42 features and is authenticated using 82,332 case. The developed approach here has been applied for the first time, to the knowledge of the authors, to detect IDS attacks with 98.80% accuracy when validated using UNSW-NB15 dataset. The experimental results show the proposed method presents satisfactory results when compared with those obtained in this field.
We propose a method to maintain high resource availability in a networked heterogeneous multi-robot system subject to resource failures. In our model, resources such as sensing and computation are available on robots. The robots are engaged in a joint task using these pooled resources. When a resource on a particular robot becomes unavailable (e.g., a sensor ceases to function), the system automatically reconfigures so that the robot continues to have access to this resource by communicating with other robots. Specifically, we consider the problem of selecting edges to be modified in the system's communication graph after a resource failure has occurred. We define a metric that allows us to characterize the quality of the resource distribution in the network represented by the communication graph. Upon a resource becoming unavailable due to failure, we reconFigure the network so that the resource distribution is brought as close to the maximal resource distribution as possible without a large change in the number of active inter-robot communication links. Our approach uses mixed integer semi-definite programming to achieve this goal. We employ a simulated annealing method to compute a spatial formation that satisfies the inter-robot distances imposed by the topology, along with other constraints. Our method can compute a communication topology, spatial formation, and formation change motion planning in a few seconds. We validate our method in simulation and real-robot experiments with a team of seven quadrotors.
Logic encryption, a method to lock a circuit from unauthorized use unless the correct key is provided, is the most important technique in hardware IP protection. However, with the discovery of the SAT attack, all traditional logic encryption algorithms are broken. New algorithms after the SAT attack are all vulnerable to structural analysis unless a provable obfuscation is applied to the locked circuit. But there is no provable logic obfuscation available, in spite of some vague resorting to logic resynthesis. In this paper, we formulate and discuss a trilemma in logic encryption among locking robustness, structural security, and encryption efficiency, showing that pre-SAT approaches achieve only structural security and encryption efficiency, and post-SAT approaches achieve only locking robustness and encryption efficiency. There is also a dilemma between query complexity and error number in locking. We first develop a theory and solution to the dilemma in locking between query complexity and error number. Then, we provide a provable obfuscation solution to the dilemma between structural security and locking robustness. We finally present and discuss some results towards the resolution of the trilemma in logic encryption.
Humans often assume that robots are rational. We believe robots take optimal actions given their objective; hence, when we are uncertain about what the robot's objective is, we interpret the robot's actions as optimal with respect to our estimate of its objective. This approach makes sense when robots straightforwardly optimize their objective, and enables humans to learn what the robot is trying to achieve. However, our insight is that-when robots are aware that humans learn by trusting that the robot actions are rational-intelligent robots do not act as the human expects; instead, they take advantage of the human's trust, and exploit this trust to more efficiently optimize their own objective. In this paper, we formally model instances of human-robot interaction (HRI) where the human does not know the robot's objective using a two-player game. We formulate different ways in which the robot can model the uncertain human, and compare solutions of this game when the robot has conservative, optimistic, rational, and trusting human models. In an offline linear-quadratic case study and a real-time user study, we show that trusting human models can naturally lead to communicative robot behavior, which influences end-users and increases their involvement.
We study the power of interactivity in local differential privacy. First, we focus on the difference between fully interactive and sequentially interactive protocols. Sequentially interactive protocols may query users adaptively in sequence, but they cannot return to previously queried users. The vast majority of existing lower bounds for local differential privacy apply only to sequentially interactive protocols, and before this paper it was not known whether fully interactive protocols were more powerful. We resolve this question. First, we classify locally private protocols by their compositionality, the multiplicative factor by which the sum of a protocol's single-round privacy parameters exceeds its overall privacy guarantee. We then show how to efficiently transform any fully interactive compositional protocol into an equivalent sequentially interactive protocol with a blowup in sample complexity linear in this compositionality. Next, we show that our reduction is tight by exhibiting a family of problems such that any sequentially interactive protocol requires this blowup in sample complexity over a fully interactive compositional protocol. We then turn our attention to hypothesis testing problems. We show that for a large class of compound hypothesis testing problems - which include all simple hypothesis testing problems as a special case - a simple noninteractive test is optimal among the class of all (possibly fully interactive) tests.
Localizing concurrency faults that occur in production is hard because, (1) detailed field data, such as user input, file content and interleaving schedule, may not be available to developers to reproduce the failure; (2) it is often impractical to assume the availability of multiple failing executions to localize the faults using existing techniques; (3) it is challenging to search for buggy locations in an application given limited runtime data; and, (4) concurrency failures at the system level often involve multiple processes or event handlers (e.g., software signals), which can not be handled by existing tools for diagnosing intra-process(thread-level) failures. To address these problems, we present SCMiner, a practical online bug diagnosis tool to help developers understand how a system-level concurrency fault happens based on the logs collected by the default system audit tools. SCMiner achieves online bug diagnosis to obviate the need for offline bug reproduction. SCMiner does not require code instrumentation on the production system or rely on the assumption of the availability of multiple failing executions. Specifically, after the system call traces are collected, SCMiner uses data mining and statistical anomaly detection techniques to identify the failure-inducing system call sequences. It then maps each abnormal sequence to specific application functions. We have conducted an empirical study on 19 real-world benchmarks. The results show that SCMiner is both effective and efficient at localizing system-level concurrency faults.
The security of web browsers is of paramount importance, these days perhaps more than ever. Unfortunately, acquiring real data for security-related research is not an easy task, as access to sensitive information is rarely granted to researchers who are not members of a trusted security team. In this paper, we describe a method to mine security-related commits from open source software repositories, even if the reports of already fixed security issues have access restrictions, and we show the applicability of the method on two popular web browser projects. We also made the mined dataset available, listing more than 13,000 security-related commits, with which we hope to facilitate research on security-targeted bug prediction.
Source camera attribution of digital images has been a hot research topic in digital forensics literature. However, the thermal cameras and the radiometric data they generate stood as a nascent topic, as such devices are expensive and tailored for specific use-cases - not adapted by the masses. This has changed dramatically, with the low-cost, pluggable thermal-camera add-ons to smartphones and similar low-cost pocket-size thermal cameras introduced to consumers recently, which enabled the use of thermal imaging devices for the masses. In this paper, we are going to investigate the use of an established source device attribution method on radiometric data produced with a consumer-level, low-cost handheld thermal camera. The results we represent in this paper are promising and show that it is quite possible to attribute thermal images with their source camera.
Cloud service has the computing characteristics of self-organizing strain on demand, which is prone to failure or loss of responsibility in its extensive application. In the prediction or accountability of this, the modeling of cloud service structure becomes an insurmountable priority. This paper reviews the modeling of cloud service network architecture. It mainly includes: Firstly, the research status of cloud service structure modeling is analyzed and reviewed. Secondly, the classification of time-varying structure of cloud services and the classification of time-varying structure modeling methods are summarized as a whole. Thirdly, it points out the existing problems. Finally, for cloud service accountability, research approach of time-varying structure modeling is proposed.
As data security has become one of the most crucial issues in modern storage system/application designs, the data sanitization techniques are regarded as the promising solution on 3D NAND flash-memory-based devices. Many excellent works had been proposed to exploit the in-place reprogramming, erasure and encryption techniques to achieve and implement the sanitization functionalities. However, existing sanitization approaches could lead to performance, disturbance overheads or even deciphered issues. Different from existing works, this work aims at exploring an instantaneous data sanitization scheme by taking advantage of programming disturbance properties. Our proposed design can not only achieve the instantaneous data sanitization by exploiting programming disturbance and error correction code properly, but also enhance the performance with the recycling programming design. The feasibility and capability of our proposed design are evaluated by a series of experiments on 3D NAND flash memory chips, for which we have very encouraging results. The experiment results show that the proposed design could achieve the instantaneous data sanitization with low overhead; besides, it improves the average response time and reduces the number of block erase count by up to 86.8% and 88.8%, respectively.
With rapid growth of network size and complexity, network defenders are facing more challenges in protecting networked computers and other devices from acute attacks. Traffic visualization is an essential element in an anomaly detection system for visual observations and detection of distributed DoS attacks. This paper presents an interactive visualization system called TVis, proposed to detect both low-rate and highrate DDoS attacks using Heron's triangle-area mapping. TVis allows network defenders to identify and investigate anomalies in internal and external network traffic at both online and offline modes. We model the network traffic as an undirected graph and compute triangle-area map based on incidences at each vertex for each 5 seconds time window. The system triggers an alarm iff the system finds an area of the mapped triangle beyond the dynamic threshold. TVis performs well for both low-rate and high-rate DDoS detection in comparison to its competitors.
The increasing diffusion of malware endowed with steganographic techniques requires to carefully identify and evaluate a new set of threats. The creation of a covert channel to hide a communication within network traffic is one of the most relevant, as it can be used to exfiltrate information or orchestrate attacks. Even if network steganography is becoming a well-studied topic, only few works focus on IPv6 and consider real network scenarios. Therefore, this paper investigates IPv6 covert channels deployed in the wild. Also, it presents a performance evaluation of six different data hiding techniques for IPv6 including their ability to bypass some intrusion detection systems. Lastly, ideas to detect IPv6 covert channels are presented.
Recent work proposed the concept of backdoor attacks on deep neural networks (DNNs), where misclassification rules are hidden inside normal models, only to be triggered by very specific inputs. However, these "traditional" backdoors assume a context where users train their own models from scratch, which rarely occurs in practice. Instead, users typically customize "Teacher" models already pretrained by providers like Google, through a process called transfer learning. This customization process introduces significant changes to models and disrupts hidden backdoors, greatly reducing the actual impact of backdoors in practice. In this paper, we describe latent backdoors, a more powerful and stealthy variant of backdoor attacks that functions under transfer learning. Latent backdoors are incomplete backdoors embedded into a "Teacher" model, and automatically inherited by multiple "Student" models through transfer learning. If any Student models include the label targeted by the backdoor, then its customization process completes the backdoor and makes it active. We show that latent backdoors can be quite effective in a variety of application contexts, and validate its practicality through real-world attacks against traffic sign recognition, iris identification of volunteers, and facial recognition of public figures (politicians). Finally, we evaluate 4 potential defenses, and find that only one is effective in disrupting latent backdoors, but might incur a cost in classification accuracy as tradeoff.
We present a machine-checked proof of security for the domain management protocol of Amazon Web Services' KMS (Key Management Service) a critical security service used throughout AWS and by AWS customers. Domain management is at the core of AWS KMS; it governs the top-level keys that anchor the security of encryption services at AWS. We show that the protocol securely implements an ideal distributed encryption mechanism under standard cryptographic assumptions. The proof is machine-checked in the EasyCrypt proof assistant and is the largest EasyCrypt development to date.
We introduce Oblivious Key Management Systems (KMS) as a much more secure alternative to traditional wrapping-based KMS that form the backbone of key management in large-scale data storage deployments. The new system, that builds on Oblivious Pseudorandom Functions (OPRF), hides keys and object identifiers from the KMS, offers unconditional security for key transport, provides key verifiability, reduces storage, and more. Further, we show how to provide all these features in a distributed threshold implementation that enhances protection against server compromise. We extend this system with updatable encryption capability that supports key updates (known as key rotation) so that upon the periodic change of OPRF keys by the KMS server, a very efficient update procedure allows a client of the KMS service to non-interactively update all its encrypted data to be decryptable only by the new key. This enhances security with forward and post-compromise security, namely, security against future and past compromises, respectively, of the client's OPRF keys held by the KMS. Additionally, and in contrast to traditional KMS, our solution supports public key encryption and dispenses with any interaction with the KMS for data encryption (only decryption by the client requires such communication). Our solutions build on recent work on updatable encryption but with significant enhancements applicable to the remote KMS setting. In addition to the critical security improvements, our designs are highly efficient and ready for use in practice. We report on experimental implementation and performance.
With the remarkable success of deep learning, Deep Neural Networks (DNNs) have been applied as dominant tools to various machine learning domains. Despite this success, however, it has been found that DNNs are surprisingly vulnerable to malicious attacks; adding a small, perceptually indistinguishable perturbations to the data can easily degrade classification performance. Adversarial training is an effective defense strategy to train a robust classifier. In this work, we propose to utilize the generator to learn how to create adversarial examples. Unlike the existing approaches that create a one-shot perturbation by a deterministic generator, we propose a recursive and stochastic generator that produces much stronger and diverse perturbations that comprehensively reveal the vulnerability of the target classifier. Our experiment results on MNIST and CIFAR-10 datasets show that the classifier adversarially trained with our method yields more robust performance over various white-box and black-box attacks.