Biblio
Cross-Site Scripting (XSS) is an attack most often carried out by attackers to attack a website by inserting malicious scripts into a website. This attack will take the user to a webpage that has been specifically designed to retrieve user sessions and cookies. Nearly 68% of websites are vulnerable to XSS attacks. In this study, the authors conducted a study by evaluating several machine learning methods, namely Support Vector Machine (SVM), K-Nearest Neighbour (KNN), and Naïve Bayes (NB). The machine learning algorithm is then equipped with the n-gram method to each script feature to improve the detection performance of XSS attacks. The simulation results show that the SVM and n-gram method achieves the highest accuracy with 98%.
Federated learning is a novel distributed learning framework, where the deep learning model is trained in a collaborative manner among thousands of participants. The shares between server and participants are only model parameters, which prevent the server from direct access to the private training data. However, we notice that the federated learning architecture is vulnerable to an active attack from insider participants, called poisoning attack, where the attacker can act as a benign participant in federated learning to upload the poisoned update to the server so that he can easily affect the performance of the global model. In this work, we study and evaluate a poisoning attack in federated learning system based on generative adversarial nets (GAN). That is, an attacker first acts as a benign participant and stealthily trains a GAN to mimic prototypical samples of the other participants' training set which does not belong to the attacker. Then these generated samples will be fully controlled by the attacker to generate the poisoning updates, and the global model will be compromised by the attacker with uploading the scaled poisoning updates to the server. In our evaluation, we show that the attacker in our construction can successfully generate samples of other benign participants using GAN and the global model performs more than 80% accuracy on both poisoning tasks and main tasks.
Short-term load forecasting systems for power grids have demonstrated high accuracy and have been widely employed for commercial use. However, classic load forecasting systems, which are based on statistical methods, are subject to vulnerability from training data poisoning. In this paper, we demonstrate a data poisoning strategy that effectively corrupts the forecasting model even in the presence of outlier detection. To the best of our knowledge, poisoning attack on short-term load forecasting with outlier detection has not been studied in previous works. Our method applies to several forecasting models, including the most widely-adapted and best-performing ones, such as multiple linear regression (MLR) and neural network (NN) models. Starting with the MLR model, we develop a novel closed-form solution to quickly estimate the new MLR model after a round of data poisoning without retraining. We then employ line search and simulated annealing to find the poisoning attack solution. Furthermore, we use the MLR attacking solution to generate a numerical solution for other models, such as NN. The effectiveness of our algorithm has been tested on the Global Energy Forecasting Competition (GEFCom2012) data set with the presence of outlier detection.
Deep neural networks (DNNs) provide good performance for image recognition, speech recognition, and pattern recognition. However, a poisoning attack is a serious threat to DNN's security. The poisoning attack is a method to reduce the accuracy of DNN by adding malicious training data during DNN training process. In some situations such as a military, it may be necessary to drop only a chosen class of accuracy in the model. For example, if an attacker does not allow only nuclear facilities to be selectively recognized, it may be necessary to intentionally prevent UAV from correctly recognizing nuclear-related facilities. In this paper, we propose a selective poisoning attack that reduces the accuracy of only chosen class in the model. The proposed method reduces the accuracy of a chosen class in the model by training malicious training data corresponding to a chosen class, while maintaining the accuracy of the remaining classes. For experiment, we used tensorflow as a machine learning library and MNIST and CIFAR10 as datasets. Experimental results show that the proposed method can reduce the accuracy of the chosen class to 43.2% and 55.3% in MNIST and CIFAR10, while maintaining the accuracy of the remaining classes.
In order to solve the problem that there is no effective means to find the optimal number of hidden nodes of single-hidden-layer feedforward neural network, in this paper, a method will be introduced to solve it effectively by using singular value decomposition. First, the training data need to be normalized strictly by attribute-based data normalization and sample-based data normalization. Then, the normalized data is decomposed based on the singular value decomposition, and the number of hidden nodes is determined according to main eigenvalues. The experimental results of MNIST data set and APS data set show that the feedforward neural network can attain satisfactory performance in the classification task.
Person re-identification(Person Re-ID) means that images of a pedestrian from cameras in a surveillance camera network can be automatically retrieved based on one of this pedestrian's image from another camera. The appearance change of pedestrians under different cameras poses a huge challenge to person re-identification. Person re-identification systems based on deep learning can effectively extract the appearance features of pedestrians. In this paper, the feature enhancement experiment is conducted, and the result showed that the current person reidentification datasets are relatively small and cannot fully meet the need of deep training. Therefore, this paper studied the method of using generative adversarial network to extend the person re-identification datasets and proposed a label smoothing regularization for outliers with weight (LSROW) algorithm to make full use of the generated data, effectively improved the accuracy of person re-identification.
In industrial internet of things, various devices are connected to external internet. For the connected devices, the authentication is very important in the viewpoint of security; therefore, physical unclonable functions (PUFs) have attracted attention as authentication techniques. On the other hand, the risk of modeling attacks on PUFs, which clone the function of PUFs mathematically, is pointed out. Therefore, a resistant-PUF such as a lightweight PUF has been proposed. However, new analytical methods (side-channel attacks: SCAs), which use side-channel information such as power or electromagnetic waves, have been proposed. The countermeasure method has also been proposed; however, an evaluation using actual devices has not been studied. Since PUFs use small production variations, the implementation evaluation is very important. Therefore, this study proposes a SCA countermeasure of the lightweight PUF. The proposed method is based on the previous studies, and maintains power consumption consistency during the generation of response. In experiments using a field programmable gate array, the measured power consumption was constant regardless of output values of the PUF could be confirmed. Then, experimental results showed that the predicted rate of the response was about 50 %, and the proposed method had a tamper resistance against SCAs.
With the globalization of integrated circuit (IC) design and manufacturing, malicious third-party vendors can easily insert hardware Trojans into their intellect property (IP) cores during IC design phase, threatening the security of IC systems. It is strongly required to develop hardware-Trojan detection methods especially for the IC design phase. As the particularity of Trigger nets in Trojan circuits, in this paper, we propose an ensemble-learning-based hardware-Trojan detection method by detecting the Trigger nets at the gate level. We extract the Trigger-net features for each net from known netlists and use the ensemble learning method to train two detection models according to the Trojan types. The detection models are used to identify suspicious Trigger nets in an unknown detected netlist and give results of suspiciousness values for each detected net. By flagging the top n% suspicious nets of each detection model as the suspicious Trigger nets based on the suspiciousness values, the proposed method can achieve, on average, 88% true positive rate, 90% true negative rate, and 90% Accuracy.
Machine learning (ML) classifiers are vulnerable to adversarial examples. An adversarial example is an input sample which is slightly modified to induce misclassification in an ML classifier. In this work, we investigate white-box and grey-box evasion attacks to an ML-based malware detector and conduct performance evaluations in a real-world setting. We compare the defense approaches in mitigating the attacks. We propose a framework for deploying grey-box and black-box attacks to malware detection systems.
Deep neural networks are susceptible to various inference attacks as they remember information about their training data. We design white-box inference attacks to perform a comprehensive privacy analysis of deep learning models. We measure the privacy leakage through parameters of fully trained models as well as the parameter updates of models during training. We design inference algorithms for both centralized and federated learning, with respect to passive and active inference attackers, and assuming different adversary prior knowledge. We evaluate our novel white-box membership inference attacks against deep learning algorithms to trace their training data records. We show that a straightforward extension of the known black-box attacks to the white-box setting (through analyzing the outputs of activation functions) is ineffective. We therefore design new algorithms tailored to the white-box setting by exploiting the privacy vulnerabilities of the stochastic gradient descent algorithm, which is the algorithm used to train deep neural networks. We investigate the reasons why deep learning models may leak information about their training data. We then show that even well-generalized models are significantly susceptible to white-box membership inference attacks, by analyzing state-of-the-art pre-trained and publicly available models for the CIFAR dataset. We also show how adversarial participants, in the federated learning setting, can successfully run active membership inference attacks against other participants, even when the global model achieves high prediction accuracies.
Corpora used to learn open-domain Question-Answering (QA) models are typically collected from a wide variety of topics or domains. Since QA requires understanding natural language, open-domain QA models generally need very large training corpora. A simple way to alleviate data demand is to restrict the domain covered by the QA model, leading thus to domain-specific QA models. While learning improved QA models for a specific domain is still challenging due to the lack of sufficient training data in the topic of interest, additional training data can be obtained from related topic domains. Thus, instead of learning a single open-domain QA model, we investigate domain adaptation approaches in order to create multiple improved domain-specific QA models. We demonstrate that this can be achieved by stratifying the source dataset, without the need of searching for complementary data unlike many other domain adaptation approaches. We propose a deep architecture that jointly exploits convolutional and recurrent networks for learning domain-specific features while transferring domain-shared features. That is, we use transferable features to enable model adaptation from multiple source domains. We consider different transference approaches designed to learn span-level and sentence-level QA models. We found that domain-adaptation greatly improves sentence-level QA performance, and span-level QA benefits from sentence information. Finally, we also show that a simple clustering algorithm may be employed when the topic domains are unknown and the resulting loss in accuracy is negligible.
In big data era, machine learning is one of fundamental techniques in intrusion detection systems (IDSs). Poisoning attack, which is one of the most recognized security threats towards machine learning- based IDSs, injects some adversarial samples into the training phase, inducing data drifting of training data and a significant performance decrease of target IDSs over testing data. In this paper, we adopt the Edge Pattern Detection (EPD) algorithm to design a novel poisoning method that attack against several machine learning algorithms used in IDSs. Specifically, we propose a boundary pattern detection algorithm to efficiently generate the points that are near to abnormal data but considered to be normal ones by current classifiers. Then, we introduce a Batch-EPD Boundary Pattern (BEBP) detection algorithm to overcome the limitation of the number of edge pattern points generated by EPD and to obtain more useful adversarial samples. Based on BEBP, we further present a moderate but effective poisoning method called chronic poisoning attack. Extensive experiments on synthetic and three real network data sets demonstrate the performance of the proposed poisoning method against several well-known machine learning algorithms and a practical intrusion detection method named FMIFS-LSSVM-IDS.
We present an effective machine learning method for malicious activity detection in enterprise security logs. Our method involves feature engineering, or generating new features by applying operators on features of the raw data. We generate DNF formulas from raw features, extract Boolean functions from them, and leverage Fourier analysis to generate new parity features and rank them based on their highest Fourier coefficients. We demonstrate on real enterprise data sets that the engineered features enhance the performance of a wide range of classifiers and clustering algorithms. As compared to classification of raw data features, the engineered features achieve up to 50.6% improvement in malicious recall, while sacrificing no more than 0.47% in accuracy. We also observe better isolation of malicious clusters, when performing clustering on engineered features. In general, a small number of engineered features achieve higher performance than raw data features according to our metrics of interest. Our feature engineering method also retains interpretability, an important consideration in cyber security applications.
In this paper, we propose a new regularization scheme for the well-known Support Vector Machine (SVM) classifier that operates on the training sample level. The proposed approach is motivated by the fact that Maximum Margin-based classification defines decision functions as a linear combination of the selected training data and, thus, the variations on training sample selection directly affect generalization performance. We show that the exploitation of the proposed regularization scheme is well motivated and intuitive. Experimental results show that the proposed regularization scheme outperforms standard SVM in human action recognition tasks as well as classical recognition problems.
Network traffic identification has been a hot topic in network security area. The identification of abnormal traffic can detect attack traffic and helps network manager enforce corresponding security policies to prevent attacks. Support Vector Machines (SVMs) are one of the most promising supervised machine learning (ML) algorithms that can be applied to the identification of traffic in IP networks as well as detection of abnormal traffic. SVM shows better performance because it can avoid local optimization problems existed in many supervised learning algorithms. However, as a binary classification approach, SVM needs more research in multiclass classification. In this paper, we proposed an abnormal traffic identification system(ATIS) that can classify and identify multiple attack traffic applications. Each component of ATIS is introduced in detail and experiments are carried out based on ATIS. Through the test of KDD CUP dataset, SVM shows good performance. Furthermore, the comparison of experiments reveals that scaling and parameters has a vital impact on SVM training results.
Explosive naval mines pose a threat to ocean and sea faring vessels, both military and civilian. This work applies deep neural network (DNN) methods to the problem of detecting minelike objects (MLO) on the seafloor in side-scan sonar imagery. We explored how the DNN depth, memory requirements, calculation requirements, and training data distribution affect detection efficacy. A visualization technique (class activation map) was incorporated that aids a user in interpreting the model's behavior. We found that modest DNN model sizes yielded better accuracy (98%) than very simple DNN models (93%) and a support vector machine (78%). The largest DNN models achieved textless;1% efficacy increase at a cost of a 17x increase of trainable parameter count and computation requirements. In contrast to DNNs popularized for many-class image recognition tasks, the models for this task require far fewer computational resources (0.3% of parameters), and are suitable for embedded use within an autonomous unmanned underwater vehicle.
Hierarchical Graph Neuron (HGN) is an extension of network-centric algorithm called Graph Neuron (GN), which is used to perform parallel distributed pattern recognition. In this research, HGN scheme is used to classify intrusion attacks in computer networks. Patterns of intrusion attacks are preprocessed in three steps: selecting attributes using information gain attribute evaluation, discretizing the selected attributes using entropy-based discretization supervised method, and selecting the training data using K-Means clustering algorithm. After the preprocessing stage, the HGN scheme is then deployed to classify intrusion attack using the KDD Cup 99 dataset. The results of the classification are measured in terms of accuracy rate, detection rate, false positive rate and true negative rate. The test result shows that the HGN scheme is promising and stable in classifying the intrusion attack patterns with accuracy rate reaches 96.27%, detection rate reaches 99.20%, true negative rate below 15.73%, and false positive rate as low as 0.80%.
Deep learning techniques have demonstrated the ability to perform a variety of object recognition tasks using visible imager data; however, deep learning has not been implemented as a means to autonomously detect and assess targets of interest in a physical security system. We demonstrate the use of transfer learning on a convolutional neural network (CNN) to significantly reduce training time while keeping detection accuracy of physical security relevant targets high. Unlike many detection algorithms employed by video analytics within physical security systems, this method does not rely on temporal data to construct a background scene; targets of interest can halt motion indefinitely and still be detected by the implemented CNN. A key advantage of using deep learning is the ability for a network to improve over time. Periodic retraining can lead to better detection and higher confidence rates. We investigate training data size versus CNN test accuracy using physical security video data. Due to the large number of visible imagers, significant volume of data collected daily, and currently deployed human in the loop ground truth data, physical security systems present a unique environment that is well suited for analysis via CNNs. This could lead to the creation of algorithmic element that reduces human burden and decreases human analyzed nuisance alarms.