Biblio
Anomaly detection generally involves the extraction of features from entities' or users' properties, and the design of anomaly detection models using machine learning or deep learning algorithms. However, only considering entities' property information could lead to high false positives. We posit the importance of also considering connections or relationships between entities in the detecting of anomalous behaviors and associated threat groups. Therefore, in this paper, we design a GCN (graph convolutional networks) based anomaly detection model to detect anomalous behaviors of users and malicious threat groups. The GCN model could characterize entities' properties and structural information between them into graphs. This allows the GCN based anomaly detection model to detect both anomalous behaviors of individuals and associated anomalous groups. We then evaluate the proposed model using a real-world insider threat data set. The results show that the proposed model outperforms several state-of-art baseline methods (i.e., random forest, logistic regression, SVM, and CNN). Moreover, the proposed model can also be applied to other anomaly detection applications.
We classify .NET files as either benign or malicious by examining directed graphs derived from the set of functions comprising the given file. Each graph is viewed probabilistically as a Markov chain where each node represents a code block of the corresponding function, and by computing the PageRank vector (Perron vector with transport), a probability measure can be defined over the nodes of the given graph. Each graph is vectorized by computing Lebesgue antiderivatives of hand-engineered functions defined on the vertex set of the given graph against the PageRank measure. Files are subsequently vectorized by aggregating the set of vectors corresponding to the set of graphs resulting from decompiling the given file. The result is a fast, intuitive, and easy-to-compute glass-box vectorization scheme, which can be leveraged for training a standalone classifier or to augment an existing feature space. We refer to this vectorization technique as PageRank Measure Integration Vectorization (PMIV). We demonstrate the efficacy of PMIV by training a vanilla random forest on 2.5 million samples of decompiled. NET, evenly split between benign and malicious, from our in-house corpus and compare this model to a baseline model which leverages a text-only feature space. The median time needed for decompilation and scoring was 24ms. 11Code available at https://github.com/gtownrocks/grafuple.
With the development of IoT and 5G networks, the demand for the next-generation intelligent transportation system has been growing at a rapid pace. Dynamic mapping has been considered one of the key technologies to reduce traffic accidents and congestion in the intelligent transportation system. However, as the number of vehicles keeps growing, a huge volume of mapping traffic may overload the central cloud, leading to serious performance degradation. In this paper, we propose and prototype a CUPS (control and user plane separation)-based edge computing architecture for the dynamic mapping and quantify its benefits by prototyping. There are a couple of merits of our proposal: (i) we can mitigate the overhead of the networks and central cloud because we only need to abstract and send global dynamic mapping information from the edge servers to the central cloud; (ii) we can reduce the response latency since the dynamic mapping traffic can be isolated from other data traffic by being generated and distributed from a local edge server that is deployed closer to the vehicles than the central server in cloud. The capabilities of our system have been quantified. The experimental results have shown our system achieves throughput improvement by more than four times, and response latency reduction by 67.8% compared to the conventional central cloud-based approach. Although these results are still obtained from the preliminary evaluations using our prototype system, we believe that our proposed architecture gives insight into how we utilize CUPS and edge computing to enable efficient dynamic mapping applications.
Continuous and adaptive learning is an effective learning approach when dealing with highly dynamic and changing scenarios, where concept drift often happens. In a continuous, stream or adaptive learning setup, new measurements arrive continuously and there are no boundaries for learning, meaning that the learning model has to decide how and when to (re)learn from these new data constantly. We address the problem of adaptive and continual learning for network security, building dynamic models to detect network attacks in real network traffic. The combination of fast and big network measurements data with the re-training paradigm of adaptive learning imposes complex challenges in terms of data processing speed, which we tackle by relying on big data platforms for parallel stream processing. We build and benchmark different adaptive learning models on top of a novel big data analytics platform for network traffic monitoring and analysis tasks, and show that high speed-up computations (as high as × 6) can be achieved by parallelizing off-the-shelf stream learning approaches.
Concurrency vulnerabilities are extremely harmful and can be frequently exploited to launch severe attacks. Due to the non-determinism of multithreaded executions, it is very difficult to detect them. Recently, data race detectors and techniques based on maximal casual model have been applied to detect concurrency vulnerabilities. However, the former are ineffective and the latter report many false negatives. In this paper, we present CONVUL, an effective tool for concurrency vulnerability detection. CONVUL is based on exchangeable events, and adopts novel algorithms to detect three major kinds of concurrency vulnerabilities. In our experiments, CONVUL detected 9 of 10 known vulnerabilities, while other tools only detected at most 2 out of these 10 vulnerabilities. The 10 vulnerabilities are available at https://github.com/mryancai/ConVul.
As the Internet of Things (IoT) continues to expand into every facet of our daily lives, security researchers have warned of its myriad security risks. While denial-of-service attacks and privacy violations have been at the forefront of research, covert channel communications remain an important concern. Utilizing a Bluetooth controlled light bulb, we demonstrate three separate covert channels, consisting of current utilization, luminosity and hue. To study the effectiveness of these channels, we implement exfiltration attacks using standard off-the-shelf smart bulbs and RGB LEDs at ranges of up to 160 feet. We analyze the identified channels for throughput, generality and stealthiness, and report transmission speeds of up to 832 bps.
Deep neural networks are widely used in many walks of life. Techniques such as transfer learning enable neural networks pre-trained on certain tasks to be retrained for a new duty, often with much less data. Users have access to both pre-trained model parameters and model definitions along with testing data but have either limited access to training data or just a subset of it. This is risky for system-critical applications, where adversarial information can be maliciously included during the training phase to attack the system. Determining the existence and level of attack in a model is challenging. In this paper, we present evidence on how adversarially attacking training data increases the boundary of model parameters using as an example of a CNN model and the MNIST data set as a test. This expansion is due to new characteristics of the poisonous data that are added to the training data. Approaching the problem from the feature space learned by the network provides a relation between them and the possible parameters taken by the model on the training phase. An algorithm is proposed to determine if a given network was attacked in the training by comparing the boundaries of parameters distribution on intermediate layers of the model estimated by using the Maximum Entropy Principle and the Variational inference approach.
Scala programming language combines object-oriented and functional programming in one concise, high-level language, and the language supports static types that help to avoid bugs in complex programs. This paper proposes a dynamic taint analyzer called ScalaTaint for Scala applications. The analyzer traces the propagation of malicious inputs from untrusted sources to sensitive sink methods in programs that can be exploited by adversaries. In this work, we evaluated the accuracy of ScalaTaint with a security benchmark suite including 7 projects in Scala. As a result, our analyzer could report 49 vulnerabilities within 753,372 lines of code. Moreover, the result of our performance measurement on ScalaBench shows 67% runtime overhead that demonstrates the usefulness and efficiently of our technique in comparison with similar tools.
Cryptojacking is the permissionless use of a target device to covertly mine cryptocurrencies. With cryptojacking attackers use malicious JavaScript codes to force web browsers into solving proof-of-work puzzles, thus making money by exploiting resources of the website visitors. To understand and counter such attacks, we systematically analyze the static, dynamic, and economic aspects of in-browser cryptojacking. For static analysis, we perform content-, currency-, and code-based categorization of cryptojacking samples to 1) measure their distribution across websites, 2) highlight their platform affinities, and 3) study their code complexities. We apply unsupervised learning to distinguish cryptojacking scripts from benign and other malicious JavaScript samples with 96.4% accuracy. For dynamic analysis, we analyze the effect of cryptojacking on critical system resources, such as CPU and battery usage. Additionally, we perform web browser fingerprinting to analyze the information exchange between the victim node and the dropzone cryptojacking server. We also build an analytical model to empirically evaluate the feasibility of cryptojacking as an alternative to online advertisement. Our results show a large negative profit and loss gap, indicating that the model is economically impractical. Finally, by leveraging insights from our analyses, we build countermeasures for in-browser cryptojacking that improve upon the existing remedies.
Development of an attack-resilient smart grid depends heavily on the availability of a representative environment, such as a Cyber Physical Security (CPS) testbed, to accelerate the transition of state-of-the-art research work to industry deployment by experimental testing and validation. There is an ongoing initiative to develop an interconnected federated testbed to build advanced computing systems and integrated data sharing networks. In this paper, we present a distributed simulation for power system using federated testbed in the context of Wide Area Monitoring System (WAMS) cyber-physical security. In particular, we have applied the transmission line modeling (TLM) technique to split a first order two-bus system into two subsystems: source and load subsystems, which are running in geographically dispersed simulators, while exchanging system variables over the internet. We have leveraged the resources available at Iowa State University's Power Cyber Laboratory (ISU PCL) and the US Army Research Laboratory (US ARL) to perform the distributed simulation, emulate substation and control center networks, and further implement a data integrity attack and physical disturbances targeting WAMS application. Our experimental results reveal the computed wide-area network latency; and model validation errors. Further, we also discuss the high-level conceptual architecture, inspired by NASPInet, necessary for developing the CPS testbed federation.
According to the information security requirements of the industrial control system and the technical features of the existing defense measures, a dynamic security control strategy based on trusted computing is proposed. According to the strategy, the Industrial Cyber-Physical System system information security solution is proposed, and the linkage verification mechanism between the internal fire control wall of the industrial control system, the intrusion detection system and the trusted connection server is provided. The information exchange of multiple network security devices is realized, which improves the comprehensive defense capability of the industrial control system, and because the trusted platform module is based on the hardware encryption, storage, and control protection mode, It overcomes the common problem that the traditional repairing and stitching technique based on pure software leads to easy breakage, and achieves the goal of significantly improving the safety of the industrial control system . At the end of the paper, the system analyzes the implementation of the proposed secure industrial control information security system based on the trustworthy calculation.
The need for data exchange and storage is currently increasing. The increased need for data exchange and storage also increases the need for data exchange devices and media. One of the most commonly used media exchanges and data storage is the USB Flash Drive. USB Flash Drive are widely used because they are easy to carry and have a fairly large storage. Unfortunately, this increased need is not directly proportional to an increase in awareness of device security, both for USB flash drive devices and computer devices that are used as primary storage devices. This research shows the threats that can arise from the use of USB Flash Drive devices. The threat that is used in this research is the fork bomb implemented on an Arduino Pro Micro device that is converted to a USB Flash drive. The purpose of the Fork Bomb is to damage the memory performance of the affected devices. As a result, memory performance to execute the process will slow down. The use of a USB Flash drive as an attack vector with the fork bomb method causes users to not be able to access the operating system that was attacked. The results obtained indicate that the USB Flash Drive can be used as a medium of Fork Bomb attack on the Windows operating system.
Every so often Humans utilize non-verbal gestures (e.g. facial expressions) to express certain information or emotions. Moreover, countless face gestures are expressed throughout the day because of the capabilities possessed by humans. However, the channels of these expression/emotions can be through activities, postures, behaviors & facial expressions. Extensive research unveiled that there exists a strong relationship between the channels and emotions which has to be further investigated. An Automatic Facial Expression Recognition (AFER) framework has been proposed in this work that can predict or anticipate seven universal expressions. In order to evaluate the proposed approach, Frontal face Image Database also named as Japanese Female Facial Expression (JAFFE) is opted as input. This database is further processed with a frequency domain technique known as Discrete Cosine transform (DCT) and then classified using Artificial Neural Networks (ANN). So as to check the robustness of this novel strategy, the random trial of K-fold cross validation, leave one out and person independent methods is repeated many times to provide an overview of recognition rates. The experimental results demonstrate a promising performance of this application.
To preserve anonymity and obfuscate their identity on online platforms users may morph their text and portray themselves as a different gender or demographic. Similarly, a chatbot may need to customize its communication style to improve engagement with its audience. This manner of changing the style of written text has gained significant attention in recent years. Yet these past research works largely cater to the transfer of single style attributes. The disadvantage of focusing on a single style alone is that this often results in target text where other existing style attributes behave unpredictably or are unfairly dominated by the new style. To counteract this behavior, it would be nice to have a style transfer mechanism that can transfer or control multiple styles simultaneously and fairly. Through such an approach, one could obtain obfuscated or written text incorporated with a desired degree of multiple soft styles such as female-quality, politeness, or formalness. To the best of our knowledge this work is the first that shows and attempt to solve the issues related to multiple style transfer. We also demonstrate that the transfer of multiple styles cannot be achieved by sequentially performing multiple single-style transfers. This is because each single style-transfer step often reverses or dominates over the style incorporated by a previous transfer step. We then propose a neural network architecture for fairly transferring multiple style attributes in a given text. We test our architecture on the Yelp dataset to demonstrate our superior performance as compared to existing one-style transfer steps performed in a sequence.
In socially assistive robotics, an important research area is the development of adaptation techniques and their effect on human-robot interaction. We present a meta-learning based policy gradient method for addressing the problem of adaptation in human-robot interaction and also investigate its role as a mechanism for trust modelling. By building an escape room scenario in mixed reality with a robot, we test our hypothesis that bi-directional trust can be influenced by different adaptation algorithms. We found that our proposed model increased the perceived trustworthiness of the robot and influenced the dynamics of gaining human's trust. Additionally, participants evaluated that the robot perceived them as more trustworthy during the interactions with the meta-learning based adaptation compared to the previously studied statistical adaptation model.
Security concerns for field-programmable gate array (FPGA) applications and hardware are evolving as FPGA designs grow in complexity, involve sophisticated intellectual properties (IPs), and pass through more entities in the design and implementation flow. FPGAs are now routinely found integrated into system-on-chip (SoC) platforms, cloud-based shared computing resources, and in commercial and government systems. The IPs included in FPGAs are sourced from multiple origins and passed through numerous entities (such as design house, system integrator, and users) through the lifecycle. This paper thoroughly examines the interaction of these entities from the perspective of the bitstream file responsible for the actual hardware configuration of the FPGA. Five stages of the bitstream lifecycle are introduced to analyze this interaction: 1) bitstream-generation, 2) bitstream-at-rest, 3) bitstream-loading, 4) bitstream-running, and 5) bitstream-end-of-life. Potential threats and vulnerabilities are discussed at each stage, and both vendor-offered and academic countermeasures are highlighted for a robust and comprehensive security assurance.
This paper focuses on the design and development of attack models on the sensory channels and an Intrusion Detection system (IDS) to protect the system from these types of attacks. The encoding/decoding formulas are defined to inject a bit of data into the sensory channel. In addition, a signal sampling technique is utilized for feature extraction. Further, an IDS framework is proposed to reside on the devices that are connected to the sensory channels to actively monitor the signals for anomaly detection. The results obtained based on our experiments have shown that the one-class SVM paired with Fourier transformation was able to detect new or Zero-day attacks.
Knowing malware types in every malware attacks is very helpful to the administrators to have proper defense policies for their system. It must be a massive benefit for the organization as well as the social if the automatic protection systems could themselves detect, classify an existence of new malware types in the whole network system with a few malware samples. This feature helps to prevent the spreading of malware as soon as any damage is caused to the networks. An approach introduced in this paper takes advantage of One-shot/few-shot learning algorithms in solving the malware classification problems by using some well-known models such as Matching Networks, Prototypical Networks. To demonstrate an efficiency of the approach, we run the experiments on the two malware datasets (namely, MalImg and Microsoft Malware Classification Challenge), and both experiments all give us very high accuracies. We confirm that if applying models correctly from the machine learning area could bring excellent performance compared to the other traditional methods, open a new area of malware research.
With the proposal of the national industrial 4.0 strategy, the integration of industrial control network and Internet technology is getting higher and higher. At the same time, the closeness of industrial control networks has been broken to a certain extent, making the problem of industrial control network security increasingly serious. S7 protocol is a private protocol of Siemens Company in Germany, which is widely used in the communication process of industrial control network. In this paper, an industrial control intrusion detection model based on S7 protocol is proposed. Traditional protocol parsing technology cannot resolve private industrial control protocols, so, this model uses deep analysis algorithm to realize the analysis of S7 data packets. At the same time, in order to overcome the complexity and portability of static white list configuration, this model dynamically builds a white list through white list self-learning algorithm. Finally, a composite intrusion detection method combining white list detection and abnormal behavior detection is used to detect anomalies. The experiment proves that the method can effectively detect the abnormal S7 protocol packet in the industrial control network.
With the deep integration of industrial control systems and Internet technologies, how to effectively detect whether industrial control systems are threatened by intrusion is a difficult problem in industrial security research. Aiming at the difficulty of high dimensionality and non-linearity of industrial control system network data, the stacked auto-encoder is used to extract the network data features, and the multi-classification support vector machine is used for classification. The research results show that the accuracy of the intrusion detection model reaches 95.8%.
Guaranteeing a certain level of user privacy in an arbitrary piece of text is a challenging issue. However, with this challenge comes the potential of unlocking access to vast data stores for training machine learning models and supporting data driven decisions. We address this problem through the lens of dx-privacy, a generalization of Differential Privacy to non Hamming distance metrics. In this work, we explore word representations in Hyperbolic space as a means of preserving privacy in text. We provide a proof satisfying dx-privacy, then we define a probability distribution in Hyperbolic space and describe a way to sample from it in high dimensions. Privacy is provided by perturbing vector representations of words in high dimensional Hyperbolic space to obtain a semantic generalization. We conduct a series of experiments to demonstrate the tradeoff between privacy and utility. Our privacy experiments illustrate protections against an authorship attribution algorithm while our utility experiments highlight the minimal impact of our perturbations on several downstream machine learning models. Compared to the Euclidean baseline, we observe \textbackslashtextgreater 20x greater guarantees on expected privacy against comparable worst case statistics.
This paper presents a machine learning classifier designed to identify SQL injection vulnerabilities in PHP code. Both classical and deep learning based machine learning algorithms were used to train and evaluate classifier models using input validation and sanitization features extracted from source code files. On ten-fold cross validations a model trained using Convolutional Neural Network(CNN) achieved the highest precision (95.4%), while a model based on Multilayer Perceptron(MLP) achieved the highest recall (63.7%) and the highest f-measure (0.746).