Biblio
In this paper, RBF-based multistage auto-encoders are used to detect IDS attacks. RBF has numerous applications in various actual life settings. The planned technique involves a two-part multistage auto-encoder and RBF. The multistage auto-encoder is applied to select top and sensitive features from input data. The selected features from the multistage auto-encoder is wired as input to the RBF and the RBF is trained to categorize the input data into two labels: attack or no attack. The experiment was realized using MATLAB2018 on a dataset comprising 175,341 case, each of which involves 42 features and is authenticated using 82,332 case. The developed approach here has been applied for the first time, to the knowledge of the authors, to detect IDS attacks with 98.80% accuracy when validated using UNSW-NB15 dataset. The experimental results show the proposed method presents satisfactory results when compared with those obtained in this field.
The main challenge for malware researchers is the large amount of data and files that need to be evaluated for potential threats. Researchers analyze a large number of new malware daily and classify them in order to extract common features. Therefore, a system that can ensure and improve the efficiency and accuracy of the classification is of great significance for the study of malware characteristics. A high-performance, high-efficiency automatic classification system based on multi-feature selection fusion of machine learning is proposed in this paper. Its performance and efficiency, according to our experiments, have been greatly improved compared to single-featured systems.
In this paper, we explore the use of machine learning technique for wormhole attack detection in ad hoc network. This work has categorized into three major tasks. One of our tasks is a simulation of wormhole attack in an ad hoc network environment with multiple wormhole tunnels. A next task is the characterization of packet attributes that lead to feature selection. Consequently, we perform data generation and data collection operation that provide large volume dataset. The final task is applied to machine learning technique for wormhole attack detection. Prior to this, a wormhole attack has detected using traditional approaches. In those, a Multirate-DelPHI is shown best results as detection rate is 90%, and the false alarm rate is 20%. We conduct experiments and illustrate that our method performs better resulting in all statistical parameters such as detection rate is 93.12% and false alarm rate is 5.3%. Furthermore, we have also shown results on various statistical parameters such as Precision, F-measure, MCC, and Accuracy.
In this paper, a general content adaptive image steganography detector in the spatial domain is proposed. We assemble conventional Haar and LBP features to construct local co-occurrence features, then the boosted classifiers are used to assemble the features as well as the final detector, and each weak classifier of the boosted classifiers corresponds to the co-occurrence feature of a local image region. Moreover, the classification ability and the generalization power of the candidate features are both evaluated for decision in the feature selection procedure of boosting training, which makes the final detector more accuracy. The experimental results on standard dataset show that the proposed framework can detect two primary content adaptive stego algorithms in the spatial domain with higher accuracy than the state-of-the-art steganalysis method.
Phishing as one of the most well-known cybercrime activities is a deception of online users to steal their personal or confidential information by impersonating a legitimate website. Several machine learning-based strategies have been proposed to detect phishing websites. These techniques are dependent on the features extracted from the website samples. However, few studies have actually considered efficient feature selection for detecting phishing attacks. In this work, we investigate an agreement on the definitive features which should be used in phishing detection. We apply Fuzzy Rough Set (FRS) theory as a tool to select most effective features from three benchmarked data sets. The selected features are fed into three often used classifiers for phishing detection. To evaluate the FRS feature selection in developing a generalizable phishing detection, the classifiers are trained by a separate out-of-sample data set of 14,000 website samples. The maximum F-measure gained by FRS feature selection is 95% using Random Forest classification. Also, there are 9 universal features selected by FRS over all the three data sets. The F-measure value using this universal feature set is approximately 93% which is a comparable result in contrast to the FRS performance. Since the universal feature set contains no features from third-part services, this finding implies that with no inquiry from external sources, we can gain a faster phishing detection which is also robust toward zero-day attacks.
Although Stylometry has been effectively used for Authorship Attribution, there is a growing number of methods being developed that allow authors to mask their identity [2, 13]. In this paper, we investigate the usage of non-traditional feature sets for Authorship Attribution. By using non-traditional feature sets, one may be able to reveal the identity of adversarial authors who are attempting to evade detection from Authorship Attribution systems that are based on more traditional feature sets. In addition, we demonstrate how GEFeS (Genetic & Evolutionary Feature Selection) can be used to evolve high-performance hybrid feature sets composed of two non-traditional feature sets for Authorship Attribution: LIWC (Linguistic Inquiry & Word Count) and Sentiment Analysis. These hybrids were able to reduce the Adversarial Effectiveness on a test set presented in [2] by approximately 33.4%.
It is well known that distributed cyber attacks simultaneously launched from many hosts have caused the most serious problems in recent years including problems of privacy leakage and denial of services. Thus, how to detect those attacks at early stage has become an important and urgent topic in the cyber security community. For this purpose, recognizing C&C (Command & Control) communication between compromised bots and the C&C server becomes a crucially important issue, because C&C communication is in the preparation phase of distributed attacks. Although attack detection based on signature has been practically applied since long ago, it is well-known that it cannot efficiently deal with new kinds of attacks. In recent years, ML(Machine learning)-based detection methods have been studied widely. In those methods, feature selection is obviously very important to the detection performance. We once utilized up to 55 features to pick out C&C traffic in order to accomplish early detection of DDoS attacks. In this work, we try to answer the question that "Are all of those features really necessary?" We mainly investigate how the detection performance moves as the features are removed from those having lowest importance and we try to make it clear that what features should be payed attention for early detection of distributed attacks. We use honeypot data collected during the period from 2008 to 2013. SVM(Support Vector Machine) and PCA(Principal Component Analysis) are utilized for feature selection and SVM and RF(Random Forest) are for building the classifier. We find that the detection performance is generally getting better if more features are utilized. However, after the number of features has reached around 40, the detection performance will not change much even more features are used. It is also verified that, in some specific cases, more features do not always means a better detection performance. We also discuss 10 important features which have the biggest influence on classification.
Feature extraction and feature selection are the first tasks in pre-processing of input logs in order to detect cybersecurity threats and attacks by utilizing data mining techniques in the field of Artificial Intelligence. When it comes to the analysis of heterogeneous data derived from different sources, these tasks are found to be time-consuming and difficult to be managed efficiently. In this paper, we present an approach for handling feature extraction and feature selection utilizing machine learning algorithms for security analytics of heterogeneous data derived from different network sensors. The approach is implemented in Apache Spark, using its python API, named pyspark.
Machine learning (ML) algorithms provide a good solution for many security sensitive applications, they themselves, however, face the threats of adversary attacks. As a key problem in machine learning, how to design robust feature selection algorithms against these attacks becomes a hot issue. The current researches on defending evasion attacks mainly focus on wrapped adversarial feature selection algorithm, i.e., WAFS, which is dependent on the classification algorithms, and time cost is very high for large-scale data. Since mRMR (minimum Redundancy and Maximum Relevance) algorithm is one of the most popular filter algorithms for feature selection without considering any classifier during feature selection process. In this paper, we propose a novel adversary-aware feature selection algorithm under filter model based on mRMR, named FAFS. The algorithm, on the one hand, takes the correlation between a single feature and a label, and the redundancy between features into account; on the other hand, when selecting features, it not only considers the generalization ability in the absence of attack, but also the robustness under attack. The performance of four algorithms, i.e., mRMR, TWFS (Traditional Wrapped Feature Selection algorithm), WAFS, and FAFS is evaluated on spam filtering and PDF malicious detection in the Perfect Knowledge attack scenarios. The experiment results show that FAFS has a better performance under evasion attacks with less time complexity, and comparable classification accuracy.
We propose a method for transferring an arbitrary style to only a specific object in an image. Style transfer is the process of combining the content of an image and the style of another image into a new image. Our results show that the proposed method can realize style transfer to specific object.
The continuous advance in recent cloud-based computer networks has generated a number of security challenges associated with intrusions in network systems. With the exponential increase in the volume of network traffic data, involvement of humans in such detection systems is time consuming and a non-trivial problem. Secondly, network traffic data tends to be highly dimensional, comprising of numerous features and attributes, making classification challenging and thus susceptible to the curse of dimensionality problem. Given such scenarios, the need arises for dimensional reduction, feature selection, combined with machine-learning techniques in the classification of such data. Therefore, as a contribution, this paper seeks to employ data mining techniques in a cloud-based environment, by selecting appropriate attributes and features with the least importance in terms of weight for the classification. Often the standard is to select features with better weights while ignoring those with least weights. In this study, we seek to find out if we can make prediction using those features with least weights. The motivation is that adversaries use stealth to hide their activities from the obvious. The question then is, can we predict any stealth activity of an adversary using the least observed attributes? In this particular study, we employ information gain to select attributes with the lowest weights and then apply machine learning to classify if a combination, in this case, of both source and destination ports are attacked or not. The motivation of this investigation is if attributes that are of least importance can be used to predict if an attack could occur. Our preliminary results show that even when the source and destination port attributes are used in combination with features with the least weights, it is possible to classify such network traffic data and predict if an attack will occur or not.
Due to the user-interface limitations of wearable devices, voice-based interfaces are becoming more common; speaker recognition may then address the authentication requirements of wearable applications. Wearable devices have small form factor, limited energy budget and limited computational capacity. In this paper, we examine the challenge of computing speaker recognition on small wearable platforms, and specifically, reducing resource use (energy use, response time) by trimming the input through careful feature selections. For our experiments, we analyze four different feature-selection algorithms and three different feature sets for speaker identification and speaker verification. Our results show that Principal Component Analysis (PCA) with frequency-domain features had the highest accuracy, Pearson Correlation (PC) with time-domain features had the lowest energy use, and recursive feature elimination (RFE) with frequency-domain features had the least latency. Our results can guide developers to choose feature sets and configurations for speaker-authentication algorithms on wearable platforms.
Supervisory Control and Data Acquisition (SCADA) systems complexity and interconnectivity increase in recent years have exposed the SCADA networks to numerous potential vulnerabilities. Several studies have shown that anomaly-based Intrusion Detection Systems (IDS) achieves improved performance to identify unknown or zero-day attacks. In this paper, we propose a hybrid model for anomaly-based intrusion detection in SCADA networks using machine learning approach. In the first part, we present a robust hybrid model for anomaly-based intrusion detection in SCADA networks. Finally, we present a feature selection model for anomaly-based intrusion detection in SCADA networks by removing redundant and irrelevant features. Irrelevant features in the dataset can affect modeling power and reduce predictive accuracy. These models were evaluated using an industrial control system dataset developed at the Distributed Analytics and Security Institute Mississippi State University Starkville, MS, USA. The experimental results show that our proposed model has a key effect in reducing the time and computational complexity and achieved improved accuracy and detection rate. The accuracy of our proposed model was measured as 99.5 % for specific-attack-labeled.
Currently, mobile botnet attacks have shifted from computers to smartphones due to its functionality, ease to exploit, and based on financial intention. Mostly, it attacks Android due to its popularity and high usage among end users. Every day, more and more malicious mobile applications (apps) with the botnet capability have been developed to exploit end users' smartphones. Therefore, this paper presents a new mobile botnet classification based on permission and Application Programming Interface (API) calls in the smartphone. This classification is developed using static analysis in a controlled lab environment and the Drebin dataset is used as the training dataset. 800 apps from the Google Play Store have been chosen randomly to test the proposed classification. As a result, 16 permissions and 31 API calls that are most related with mobile botnet have been extracted using feature selection and later classified and tested using machine learning algorithms. The experimental result shows that the Random Forest Algorithm has achieved the highest detection accuracy of 99.4% with the lowest false positive rate of 16.1% as compared to other machine learning algorithms. This new classification can be used as the input for mobile botnet detection for future work, especially for financial matters.