Biblio
Software insecurity is being identified as one of the leading causes of security breaches. In this paper, we revisited one of the strategies in solving software insecurity, which is the use of software quality metrics. We utilized a multilayer deep feedforward network in examining whether there is a combination of metrics that can predict the appearance of security-related bugs. We also applied the traditional machine learning algorithms such as decision tree, random forest, naïve bayes, and support vector machines and compared the results with that of the Deep Learning technique. The results have successfully demonstrated that it was possible to develop an effective predictive model to forecast software insecurity based on the software metrics and using Deep Learning. All the models generated have shown an accuracy of more than sixty percent with Deep Learning leading the list. This finding proved that utilizing Deep Learning methods and a combination of software metrics can be tapped to create a better forecasting model thereby aiding software developers in predicting security bugs.
Critical Infrastructures (CIs) use Supervisory Control And Data Acquisition (SCADA) systems for remote control and monitoring. Sophisticated security measures are needed to address malicious intrusions, which are steadily increasing in number and variety due to the massive spread of connectivity and standardisation of open SCADA protocols. Traditional Intrusion Detection Systems (IDSs) cannot detect attacks that are not already present in their databases. Therefore, in this paper, we assess Machine Learning (ML) for intrusion detection in SCADA systems using a real data set collected from a gas pipeline system and provided by the Mississippi State University (MSU). The contribution of this paper is two-fold: 1) The evaluation of four techniques for missing data estimation and two techniques for data normalization, 2) The performances of Support Vector Machine (SVM), and Random Forest (RF) are assessed in terms of accuracy, precision, recall and F1score for intrusion detection. Two cases are differentiated: binary and categorical classifications. Our experiments reveal that RF detect intrusions effectively, with an F1score of respectively \textbackslashtextgreater 99%.
With the increase in the popularity of computerized online applications, the analysis, and detection of a growing number of newly discovered stealthy malware poses a significant challenge to the security community. Signature-based and behavior-based detection techniques are becoming inefficient in detecting new unknown malware. Machine learning solutions are employed to counter such intelligent malware and allow performing more comprehensive malware detection. This capability leads to an automatic analysis of malware behavior. The proposed oblique random forest ensemble learning technique is efficient for malware classification. The effectiveness of the proposed method is demonstrated with three malware classification datasets from various sources. The results are compared with other variants of decision tree learning models. The proposed system performs better than the existing system in terms of classification accuracy and false positive rate.
Lately, we are facing the Malware crisis due to various types of malware or malicious programs or scripts available in the huge virtual world - the Internet. But, what is malware? Malware can be a malicious software or a program or a script which can be harmful to the user's computer. These malicious programs can perform a variety of functions, including stealing, encrypting or deleting sensitive data, altering or hijacking core computing functions and monitoring users' computer activity without their permission. There are various entry points for these programs and scripts in the user environment, but only one way to remove them is to find them and kick them out of the system which isn't an easy job as these small piece of script or code can be anywhere in the user system. This paper involves the understanding of different types of malware and how we will use Machine Learning to detect these malwares.
In this paper, we propose a deep learning framework for malware classification. There has been a huge increase in the volume of malware in recent years which poses a serious security threat to financial institutions, businesses and individuals. In order to combat the proliferation of malware, new strategies are essential to quickly identify and classify malware samples so that their behavior can be analyzed. Machine learning approaches are becoming popular for classifying malware, however, most of the existing machine learning methods for malware classification use shallow learning algorithms (e.g. SVM). Recently, Convolutional Neural Networks (CNN), a deep learning approach, have shown superior performance compared to traditional learning algorithms, especially in tasks such as image classification. Motivated by this success, we propose a CNN-based architecture to classify malware samples. We convert malware binaries to grayscale images and subsequently train a CNN for classification. Experiments on two challenging malware classification datasets, Malimg and Microsoft malware, demonstrate that our method achieves better than the state-of-the-art performance. The proposed method achieves 98.52% and 99.97% accuracy on the Malimg and Microsoft datasets respectively.
In recent years, deep convolution neural networks (DCNNs) have won many contests in machine learning, object detection, and pattern recognition. Furthermore, deep learning techniques achieved exceptional performance in image classification, reaching accuracy levels beyond human capability. Malware variants from similar categories often contain similarities due to code reuse. Converting malware samples into images can cause these patterns to manifest as image features, which can be exploited for DCNN classification. Techniques for converting malware binaries into images for visualization and classification have been reported in the literature, and while these methods do reach a high level of classification accuracy on training datasets, they tend to be vulnerable to overfitting and perform poorly on previously unseen samples. In this paper, we explore and document a variety of techniques for representing malware binaries as images with the goal of discovering a format best suited for deep learning. We implement a database for malware binaries from several families, stored in hexadecimal format. These malware samples are converted into images using various approaches and are used to train a neural network to recognize visual patterns in the input and classify malware based on the feature vectors. Each image type is assessed using a variety of learning models, such as transfer learning with existing DCNN architectures and feature extraction for support vector machine classifier training. Each technique is evaluated in terms of classification accuracy, result consistency, and time per trial. Our preliminary results indicate that improved image representation has the potential to enable more effective classification of new malware.
Modern industrial control systems (ICS) act as victims of cyber attacks more often in last years. These attacks are hard to detect and their consequences can be catastrophic. Cyber attacks can cause anomalies in the work of the ICS and its technological equipment. The presence of mutual interference and noises in this equipment significantly complicates anomaly detection. Moreover, the traditional means of protection, which used in corporate solutions, require updating with each change in the structure of the industrial process. An approach based on the machine learning for anomaly detection was used to overcome these problems. It complements traditional methods and allows one to detect signal correlations and use them for anomaly detection. Additional Tennessee Eastman Process Simulation Data for Anomaly Detection Evaluation dataset was analyzed as example of industrial process. In the course of the research, correlations between the signals of the sensors were detected and preliminary data processing was carried out. Algorithms from the most common techniques of machine learning (decision trees, linear algorithms, support vector machines) and deep learning models (neural networks) were investigated for industrial process anomaly detection task. It's shown that linear algorithms are least demanding on computational resources, but they don't achieve an acceptable result and allow a significant number of errors. Decision tree-based algorithms provided an acceptable accuracy, but the amount of RAM, required for their operations, relates polynomially with the training sample volume. The deep neural networks provided the greatest accuracy, but they require considerable computing power for internal calculations.
The Machine Type Communication Devices (MTCDs) are usually based on Internet Protocol (IP), which can cause billions of connected objects to be part of the Internet. The enormous amount of data coming from these devices are quite heterogeneous in nature, which can lead to security issues, such as injection attacks, ballot stuffing, and bad mouthing. Consequently, this work considers machine learning trust evaluation as an effective and accurate option for solving the issues associate with security threats. In this paper, a comparative analysis is carried out with five different machine learning approaches: Naive Bayes (NB), Decision Tree (DT), Linear and Radial Support Vector Machine (SVM), KNearest Neighbor (KNN), and Random Forest (RF). As a critical element of the research, the recommendations consider different Machine-to-Machine (M2M) communication nodes with regard to their ability to identify malicious and honest information. To validate the performances of these models, two trust computation measures were used: Receiver Operating Characteristics (ROCs), Precision and Recall. The malicious data was formulated in Matlab. A scenario was created where 50% of the information were modified to be malicious. The malicious nodes were varied in the ranges of 10%, 20%, 30%, 40%, and the results were carefully analyzed.
For the security of mobile ad-hoc networks (MANETs), a group of wireless mobile nodes needs to cooperate by forwarding packets, to implement an intrusion detection system (IDS). Some of the current IDS implementations in a clustered MANET have designed mobile nodes to wait until the cluster head is elected before scanning the network and thus nodes may be, unfortunately, exposed to several control packet attacks by which nodes identify falsified routes to reach other nodes. In order to detect control packet attacks such as route falsification, we design a route cache sharing mechanism for a non-clustered network where all one-hop routing data are collected by each node for a cooperative host-based detection. The cooperative host-based detection system uses a Support Vector Machine classifier and achieves a detection rate of around 95%. By successfully detecting the route falsification attacks, nodes are given the capability to avoid other attacks such as black-hole and gray-hole, which are in many cases a result of a successful route falsification attack.
The rapid growth of population and industrialization has given rise to the way for the use of technologies like the Internet of Things (IoT). Innovations in Information and Communication Technologies (ICT) carries with it many challenges to our privacy's expectations and security. In Smart environments there are uses of security devices and smart appliances, sensors and energy meters. New requirements in security and privacy are driven by the massive growth of devices numbers that are connected to IoT which increases concerns in security and privacy. The most ubiquitous threats to the security of the smart grids (SG) ascended from infrastructural physical damages, destroying data, malwares, DoS, and intrusions. Intrusion detection comprehends illegitimate access to information and attacks which creates physical disruption in the availability of servers. This work proposes an intrusion detection system using data mining techniques for intrusion detection in smart grid environment. The results showed that the proposed random forest method with a total classification accuracy of 98.94 %, F-measure of 0.989, area under the ROC curve (AUC) of 0.999, and kappa value of 0.9865 outperforms over other classification methods. In addition, the feasibility of our method has been successfully demonstrated by comparing other classification techniques such as ANN, k-NN, SVM and Rotation Forest.
The Internet of Things (IoT) is the network where physical devices, sensors, appliances and other different objects can communicate with each other without the need for human intervention. Wireless Sensor Networks (WSNs) are main building blocks of the IoT. Both the IoT and WSNs have many critical and non-critical applications that touch almost every aspect of our modern life. Unfortunately, these networks are prone to various types of security threats. Therefore, the security of IoT and WSNs became crucial. Furthermore, the resource limitations of the devices used in these networks complicate the problem. One of the most recent and effective approaches to address such challenges is machine learning. Machine learning inspires many solutions to secure the IoT and WSNs. In this paper, we survey the different threats that can attack both IoT and WSNs and the machine learning techniques developed to counter them.
With the frequent use of Wi-Fi and hotspots that provide a wireless Internet environment, awareness and threats to wireless AP (Access Point) security are steadily increasing. Especially when using unauthorized APs in company, government and military facilities, there is a high possibility of being subjected to various viruses and hacking attacks. It is necessary to detect unauthorized Aps for protection of information. In this paper, we use RTT (Round Trip Time) value data set to detect authorized and unauthorized APs in wired / wireless integrated environment, analyze them using machine learning algorithms including SVM (Support Vector Machine), C4.5, KNN (K Nearest Neighbors) and MLP (Multilayer Perceptron). Overall, KNN shows the highest accuracy.
In practice, Defenders need a more efficient network detection approach which has the advantages of quick-responding learning capability of new network behavioural features for network intrusion detection purpose. In many applications the capability of Deep Learning techniques has been confirmed to outperform classic approaches. Accordingly, this study focused on network intrusion detection using convolutional neural networks (CNNs) based on LeNet-5 to classify the network threats. The experiment results show that the prediction accuracy of intrusion detection goes up to 99.65% with samples more than 10,000. The overall accuracy rate is 97.53%.
Traditional security controls, such as firewalls, anti-virus and IDS, are ill-equipped to help IT security and response teams keep pace with the rapid evolution of the cyber threat landscape. Cyber Threat Intelligence (CTI) can help remediate this problem by exploiting non-traditional information sources, such as hacker forums and "dark-web" social platforms. Security and response teams can use the collected intelligence to identify emerging threats. Unfortunately, when manual analysis is used to extract CTI from non-traditional sources, it is a time consuming, error-prone and resource intensive process. We address these issues by using a hybrid Machine Learning model that automatically searches through hacker forum posts, identifies the posts that are most relevant to cyber security and then clusters the relevant posts into estimations of the topics that the hackers are discussing. The first (identification) stage uses Support Vector Machines and the second (clustering) stage uses Latent Dirichlet Allocation. We tested our model, using data from an actual hacker forum, to automatically extract information about various threats such as leaked credentials, malicious proxy servers, malware that evades AV detection, etc. The results demonstrate our method is an effective means for quickly extracting relevant and actionable intelligence that can be integrated with traditional security controls to increase their effectiveness.
The monitoring circuit is widely applied in radiation environment and it is of significance to study the circuit reliability with the radiation effects. In this paper, an intelligent analysis method based on Deep Belief Network (DBN) and Support Vector Method is proposed according to the radiation experiments analysis of the monitoring circuit. The Total Ionizing Dose (TID) of the monitoring circuit is used to identify the circuit degradation trend. Firstly, the output waveforms of the monitoring circuit are obtained by radiating with the different TID. Subsequently, the Deep Belief Network Model is trained to extract the features of the circuit signal. Finally, the Support Vector Machine (SVM) and Support Vector Regression (SVR) are applied to classify and predict the remaining useful life (RUL) of the monitoring circuit. According to the experimental results, the performance of DBN-SVM exceeds DBN method for feature extraction and classification, and SVR is effective for predicting the degradation.
In this paper, we discuss challenges when we try to automatically classify privacy policies using machine learning with words as the features. Since it is difficult for general public to understand privacy policies, it is necessary to support them to do that. To this end, the authors believe that machine learning is one of the promising ways because users can grasp the meaning of policies through outputs by a machine learning algorithm. Our final goal is to develop a system which automatically translates privacy policies into privacy labels [1]. Toward this goal, we classify sentences in privacy policies with category labels, using popular machine learning algorithms, such as a naive Bayes classifier.We choose these algorithms because we could use trained classifiers to evaluate keywords appropriate for privacy labels. Therefore, we adopt words as the features of those algorithms. Experimental results show about 85% accuracy. We think that much higher accuracy is necessary to achieve our final goal. By changing learning settings, we identified one reason of low accuracies such that privacy policies include many sentences which are not direct description of information about categories. It seems that such sentences are redundant but maybe they are essential in case of legal documents in order to prevent misinterpreting. Thus, it is important for machine learning algorithms to handle these redundant sentences appropriately.