Biblio
Physical unclonable functions (PUFs) are devices which are easily probed but difficult to predict. Optical PUFs have been discussed within the literature, with traditional optical PUFs typically using spatial light modulators, coherent illumination, and scattering volumes; however, these systems can be large, expensive, and difficult to maintain alignment in practical conditions. We propose and demonstrate a new kind of optical PUF based on computational imaging and compressive sensing to address these challenges with traditional optical PUFs. This work describes the design, simulation, and prototyping of this computational optical PUF (COPUF) that utilizes incoherent polychromatic illumination passing through an additively manufactured refracting optical polymer element. We demonstrate the ability to pass information through a COPUF using a variety of sampling methods, including the use of compressive sensing. The sensitivity of the COPUF system is also explored. We explore non-traditional PUF configurations enabled by the COPUF architecture. The double COPUF system, which employees two serially connected COPUFs, is proposed and analyzed as a means to authenticate and communicate between two entities that have previously agreed to communicate. This configuration enables estimation of a message inversion key without the calculation of individual COPUF inversion keys at any point in the PUF life cycle. Our results show that it is possible to construct inexpensive optical PUFs using computational imaging. This could lead to new uses of PUFs in places where electrical PUFs cannot be utilized effectively, as low cost tags and seals, and potentially as authenticating and communicating devices.
To improve the resilience of state estimation strategy against cyber attacks, the Compressive Sensing (CS) is applied in reconstruction of incomplete measurements for cyber physical systems. First, observability analysis is used to decide the time to run the reconstruction and the damage level from attacks. In particular, the dictionary learning is proposed to form the over-completed dictionary by K-Singular Value Decomposition (K-SVD). Besides, due to the irregularity of incomplete measurements, sampling matrix is designed as the measurement matrix. Finally, the simulation experiments on 6-bus power system illustrate that the proposed method achieves the incomplete measurements reconstruction perfectly, which is better than the joint dictionary. When only 29% available measurements are left, the proposed method has generality for four kinds of recovery algorithms.
This paper investigates closed-form expressions to evaluate the performance of the Compressive Sensing (CS) based Energy Detector (ED). The conventional way to approximate the probability density function of the ED test statistic invokes the central limit theorem and considers the decision variable as Gaussian. This approach, however, provides good approximation only if the number of samples is large enough. This is not usually the case in CS framework, where the goal is to keep the sample size low. Moreover, working with a reduced number of measurements is of practical interest for general spectrum sensing in cognitive radio applications, where the sensing time should be sufficiently short since any time spent for sensing cannot be used for data transmission on the detected idle channels. In this paper, we make use of low-complexity approximations based on algebraic transformations of the one-dimensional Gaussian Q-function. More precisely, this paper provides new closed-form expressions for accurate evaluation of the CS-based ED performance as a function of the compressive ratio and the Signal-to-Noise Ratio (SNR). Simulation results demonstrate the increased accuracy of the proposed equations compared to existing works.
Compressed sensing can represent the sparse signal with a small number of measurements compared to Nyquist-rate samples. Considering the high-complexity of reconstruction algorithms in CS, recently compressive detection is proposed, which performs detection directly in compressive domain without reconstruction. Different from existing work that generally considers the measurements corrupted by dense noises, this paper studies the compressive detection problem when the measurements are corrupted by both dense noises and sparse errors. The sparse errors exist in many practical systems, such as the ones affected by impulse noise or narrowband interference. We derive the theoretical performance of compressive detection when the sparse error is either deterministic or random. The theoretical results are further verified by simulations.
With the rapid and radical evolution of information and communication technology, energy consumption for wireless communication is growing at a staggering rate, especially for wireless multimedia communication. Recently, reducing energy consumption in wireless multimedia communication has attracted increasing attention. In this paper, we propose an energy-efficient wireless image transmission scheme based on adaptive block compressive sensing (ABCS) and SoftCast, which is called ABCS-SoftCast. In ABCS-SoftCast, the compression distortion and transmission distortion are considered in a joint manner, and the energy-distortion model is formulated for each image block. Then, the sampling rate (SR) and power allocation factors of each image block are optimized simultaneously. Comparing with conventional SoftCast scheme, experimental results demonstrate that the energy consumption can be greatly reduced even when the receiving image qualities are approximately the same.
There has been increasing interest in adopting BlockChain (BC), that underpins the crypto-currency Bitcoin, in Internet of Things (IoT) for security and privacy. However, BCs are computationally expensive and involve high bandwidth overhead and delays, which are not suitable for most IoT devices. This paper proposes a lightweight BC-based architecture for IoT that virtually eliminates the overheads of classic BC, while maintaining most of its security and privacy benefits. IoT devices benefit from a private immutable ledger, that acts similar to BC but is managed centrally, to optimize energy consumption. High resource devices create an overlay network to implement a publicly accessible distributed BC that ensures end-to-end security and privacy. The proposed architecture uses distributed trust to reduce the block validation processing time. We explore our approach in a smart home setting as a representative case study for broader IoT applications. Qualitative evaluation of the architecture under common threat models highlights its effectiveness in providing security and privacy for IoT applications. Simulations demonstrate that our method decreases packet and processing overhead significantly compared to the BC implementation used in Bitcoin.
Microblogging is a popular activity within the spectrum of Online Social Networking (OSN), which allows users to quicky exchange short messages. Such systems can be based on mobile clients that exchange their group-encrypted messages utilizing local communications such as Bluetooth. Since however in such cases, users do not want to disclose their group memberships, and thus have to wait for other group members to appear in the proximity, the message spread can be slow to non-existent. In this paper, we solve this problem and facilitate a higher message spread by employing a server that stores the messages of multiple groups in an Oblivious RAM (ORAM) data structure. The server can be accessed by the clients on demand to read or write their group-encrypted messages. Thus our solution can be used to add access pattern privacy on top of existing microblogging peer-2-peer architectures, and using an ORAM is a promising candidate to use in the given application scenario.
Industrial control systems (ICS) used in industrial plants are vulnerable to cyber-attacks that can cause fatal damage to the plants. Intrusion detection systems (IDSs) monitor ICS network traffic and detect suspicious activities. However, many IDSs overlook sophisticated cyber-attacks because it is hard to make a complete database of cyber-attacks and distinguish operational anomalies when compared to an established baseline. In this paper, a discriminant model between normal and anomalous packets was constructed with a support vector machine (SVM) based on an ICS communication profile, which represents only packet intervals and length, and an IDS with the applied model is proposed. Furthermore, the proposed IDS was evaluated using penetration tests on our cyber security test bed. Although the IDS was constructed by the limited features (intervals and length) of packets, the IDS successfully detected cyber-attacks by monitoring the rate of predicted attacking packets.
With the development of cyber threats on the Internet, the number of malware, especially unknown malware, is also dramatically increasing. Since all of malware cannot be analyzed by analysts, it is very important to find out new malware that should be analyzed by them. In order to cope with this issue, the existing approaches focused on malware classification using static or dynamic analysis results of malware. However, the static and the dynamic analyses themselves are also too costly and not easy to build the isolated, secure and Internet-like analysis environments such as sandbox. In this paper, we propose a lightweight malware classification method based on detection results of anti-virus software. Since the proposed method can reduce the volume of malware that should be analyzed by analysts, it can be used as a preprocess for in-depth analysis of malware. The experimental showed that the proposed method succeeded in classification of 1,000 malware samples into 187 unique groups. This means that 81% of the original malware samples do not need to analyze by analysts.
Despite widespread use of commercial anti-virus products, the number of malicious files detected on home and corporate computers continues to increase at a significant rate. Recently, anti-virus companies have started investing in machine learning solutions to augment signatures manually designed by analysts. A malicious file's determination is often represented as a hierarchical structure consisting of a type (e.g. Worm, Backdoor), a platform (e.g. Win32, Win64), a family (e.g. Rbot, Rugrat) and a family variant (e.g. A, B). While there has been substantial research in automated malware classification, the aforementioned hierarchical structure, which can provide additional information to the classification models, has been ignored. In this paper, we propose the novel idea and study the performance of employing hierarchical learning algorithms for automated classification of malicious files. To the best of our knowledge, this is the first research effort which incorporates the hierarchical structure of the malware label in its automated classification and in the security domain, in general. It is important to note that our method does not require any additional effort by analysts because they typically assign these hierarchical labels today. Our empirical results on a real world, industrial-scale malware dataset of 3.6 million files demonstrate that incorporation of the label hierarchy achieves a significant reduction of 33.1% in the binary error rate as compared to a non-hierarchical classifier which is traditionally used in such problems.
Malware classification is a critical part in the cyber-security. Traditional methodologies for the malware classification typically use static analysis and dynamic analysis to identify malware. In this paper, a malware classification methodology based on its binary image and extracting local binary pattern (LBP) features is proposed. First, malware images are reorganized into 3 by 3 grids which is mainly used to extract LBP feature. Second, the LBP is implemented on the malware images to extract features in that it is useful in pattern or texture classification. Finally, Tensorflow, a library for machine learning, is applied to classify malware images with the LBP feature. Performance comparison results among different classifiers with different image descriptors such as GIST, a spatial envelop, and the LBP demonstrate that our proposed approach outperforms others.
Detecting malicious code with exact match on collected datasets is becoming a large-scale identification problem due to the existence of new malware variants. Being able to promptly and accurately identify new attacks enables security experts to respond effectively. My proposal is to develop an automated framework for identification of unknown vulnerabilities by leveraging current neural network techniques. This has a significant and immediate value for the security field, as current anti-virus software is typically able to recognize the malware type only after its infection, and preventive measures are limited. Artificial Intelligence plays a major role in automatic malware classification: numerous machine-learning methods, both supervised and unsupervised, have been researched to try classifying malware into families based on features acquired by static and dynamic analysis. The value of automated identification is clear, as feature engineering is both a time-consuming and time-sensitive task, with new malware studied while being observed in the wild.
Nowadays, Malware has become a serious threat to the digitization of the world due to the emergence of various new and complex malware every day. Due to this, the traditional signature-based methods for detection of malware effectively becomes an obsolete method. The efficiency of the machine learning model in context to the detection of malware files has been proved by different researches and studies. In this paper, a framework has been developed to detect and classify different files (e.g exe, pdf, php, etc.) as benign and malicious using two level classifier namely, Macro (for detection of malware) and Micro (for classification of malware files as a Trojan, Spyware, Adware, etc.). Cuckoo Sandbox is used for generating static and dynamic analysis report by executing files in the virtual environment. In addition, a novel model is developed for extracting features based on static, behavioral and network analysis using analysis report generated by the Cuckoo Sandbox. Weka Framework is used to develop machine learning models by using training datasets. The experimental results using proposed framework shows high detection rate with an accuracy of 100% using J48 Decision tree model, 99% using SMO (Sequential Minimal Optimization) and 97% using Random Forest tree. It also shows effective classification rate with accuracy 100% using J48 Decision tree, 91% using SMO and 66% using Random Forest tree. These results are used for detecting and classifying unknown files as benign or malicious.
Distinguishing and classifying different types of malware is important to better understanding how they can infect computers and devices, the threat level they pose and how to protect against them. In this paper, a system for classifying malware programs is presented. The paper describes the architecture of the system and assesses its performance on a publicly available database (provided by Microsoft for the Microsoft Malware Classification Challenge BIG2015) to serve as a benchmark for future research efforts. First, the malicious programs are preprocessed such that they are visualized as gray scale images. We then make use of an architecture comprised of multiple layers (multiple levels of encoding) to carry out the classification process of those images/programs. We compare the performance of this approach against traditional machine learning and pattern recognition algorithms. Our experimental results show that the deep learning architecture yields a boost in performance over those conventional/standard algorithms. A hold-out validation analysis using the superior architecture shows an accuracy in the order of 99.15%.
Anti-virus vendors receive hundreds of thousands of malware to be analysed each day. Some are new malware while others are variations or evolutions of existing malware. Because analyzing each malware sample by hand is impossible, automated techniques to analyse and categorize incoming samples are needed. In this work, we explore various machine learning features extracted from malware samples through static analysis for classification of malware binaries into already known malware families. We present a new feature based on control statement shingling that has a comparable accuracy to ordinary opcode n-gram based features while requiring smaller dimensions. This, in turn, results in a shorter training time.
Malware technology makes it difficult for malware analyst to detect same malware files with different obfuscation technique. In this paper we are trying to tackle that problem by analyzing the sequence of system call from an executable file. Malware files which actually are the same should have almost identical or at least a similar sequence of system calls. In this paper, we are going to create a model for each malware class consists of malwares from different families based on its sequence of system calls. Method/algorithm that's used in this paper is profile hidden markov model which is a very well-known tool in the biological informatics field for comparing DNA and protein sequences. Malware classes that we are going to build are trojan and worm class. Accuracy for these classes are pretty high, it's above 90% with also a high false positive rate around 37%.
Malware writers often develop malware with automated measures, so the number of malware has increased dramatically. Automated measures tend to repeatedly use significant modules, which form the basis for identifying malware variants and discriminating malware families. Thus, we propose a novel visualization analysis method for researching malware similarity. This method converts malicious Windows Portable Executable (PE) files into local entropy images for observing internal features of malware, and then normalizes local entropy images into entropy pixel images for malware classification. We take advantage of the Jaccard index to measure similarities between entropy pixel images and the k-Nearest Neighbor (kNN) classification algorithm to assign entropy pixel images to different malware families. Preliminary experimental results show that our visualization method can discriminate malware families effectively.
The heterogeneous SIS model for virus spread in any finite size graph characterizes the influence of factors of SIS model and could be analyzed by the extended N-Intertwined model introduced in [1]. We specifically focus on the heterogeneous virus spread in the star network in this paper. The epidemic threshold and the average meta-stable state fraction of infected nodes are derived for virus spread in the star network. Our results illustrate the effect of the factors of SIS model on the steady state infection.
Malicious applications have become increasingly numerous. This demands adaptive, learning-based techniques for constructing malware detection engines, instead of the traditional manual-based strategies. Prior work in learning-based malware detection engines primarily focuses on dynamic trace analysis and byte-level n-grams. Our approach in this paper differs in that we use compiler intermediate representations, i.e., the callgraph representation of binaries. Using graph-based program representations for learning provides structure of the program, which can be used to learn more advanced patterns. We use the Shortest Path Graph Kernel (SPGK) to identify similarities between call graphs extracted from binaries. The output similarity matrix is fed into a Support Vector Machine (SVM) algorithm to construct highly-accurate models to predict whether a binary is malicious or not. However, SPGK is computationally expensive due to the size of the input graphs. Therefore, we evaluate different parallelization methods for CPUs and GPUs to speed up this kernel, allowing us to continuously construct up-to-date models in a timely manner. Our hybrid implementation, which leverages both CPU and GPU, yields the best performance, achieving up to a 14.2x improvement over our already optimized OpenMP version. We compared our generated graph-based models to previously state-of-the-art feature vector 2-gram and 3-gram models on a dataset consisting of over 22,000 binaries. We show that our classification accuracy using graphs is over 19% higher than either n-gram model and gives a false positive rate (FPR) of less than 0.1%. We are also able to consider large call graphs and dataset sizes because of the reduced execution time of our parallelized SPGK implementation.
Botnets are a growing threat to the security of data and services on a global level. They exploit vulnerabilities in networks and host machines to harvest sensitive information, or make use of network resources such as memory or bandwidth in cyber-crime campaigns. Bot programs by nature are largely automated and systematic, and this is often used to detect them. In this paper, we extend upon existing work in this area by proposing a network event correlation method to produce graphs of flows generated by botnets, outlining the implementation and functionality of this approach. We also show how this method can be combined with statistical flow-based analysis to provide a descriptive chain of events, and test on public datasets with an overall success rate of 94.1%.