Biblio
Nowadays, phishing is one of the most usual web threats with regards to the significant growth of the World Wide Web in volume over time. Phishing attackers always use new (zero-day) and sophisticated techniques to deceive online customers. Hence, it is necessary that the anti-phishing system be real-time and fast and also leverages from an intelligent phishing detection solution. Here, we develop a reliable detection system which can adaptively match the changing environment and phishing websites. Our method is an online and feature-rich machine learning technique to discriminate the phishing and legitimate websites. Since the proposed approach extracts different types of discriminative features from URLs and webpages source code, it is an entirely client-side solution and does not require any service from the third-party. The experimental results highlight the robustness and competitiveness of our anti-phishing system to distinguish the phishing and legitimate websites.
The rapid rise of cyber-crime activities and the growing number of devices threatened by them place software security issues in the spotlight. As around 90% of all attacks exploit known types of security issues, finding vulnerable components and applying existing mitigation techniques is a viable practical approach for fighting against cyber-crime. In this paper, we investigate how the state-of-the-art machine learning techniques, including a popular deep learning algorithm, perform in predicting functions with possible security vulnerabilities in JavaScript programs. We applied 8 machine learning algorithms to build prediction models using a new dataset constructed for this research from the vulnerability information in public databases of the Node Security Project and the Snyk platform, and code fixing patches from GitHub. We used static source code metrics as predictors and an extensive grid-search algorithm to find the best performing models. We also examined the effect of various re-sampling strategies to handle the imbalanced nature of the dataset. The best performing algorithm was KNN, which created a model for the prediction of vulnerable functions with an F-measure of 0.76 (0.91 precision and 0.66 recall). Moreover, deep learning, tree and forest based classifiers, and SVM were competitive with F-measures over 0.70. Although the F-measures did not vary significantly with the re-sampling strategies, the distribution of precision and recall did change. No re-sampling seemed to produce models preferring high precision, while re-sampling strategies balanced the IR measures.
Cyber Physical Systems (CPS)-Internet of Things (IoT) enabled healthcare services and infrastructures improve human life, but are vulnerable to a variety of emerging cyber-attacks. Cybersecurity specialists are finding it hard to keep pace of the increasingly sophisticated attack methods. There is a critical need for innovative cognitive cybersecurity for CPS-IoT enabled healthcare ecosystem. This paper presents a cognitive cybersecurity framework for simulating the human cognitive behaviour to anticipate and respond to new and emerging cybersecurity and privacy threats to CPS-IoT and critical infrastructure systems. It includes the conceptualisation and description of a layered architecture which combines Artificial Intelligence, cognitive methods and innovative security mechanisms.
In today's society, even though the technology is so developed, the coloring of computer images has remained at the manual stage. As a carrier of human culture and art, film has existed in our history for hundred years. With the development of science and technology, movies have developed from the simple black-and-white film era to the current digital age. There is a very complicated process for coloring old movies. Aside from the traditional hand-painting techniques, the most common method is to use post-processing software for coloring movie frames. This kind of operation requires extraordinary skills, patience and aesthetics, which is a great test for the operator. In recent years, the extensive use of machine learning and neural networks has made it possible for computers to intelligently process images. Since 2016, various types of generative adversarial networks models have been proposed to make deep learning shine in the fields of image style transfer, image coloring, and image style change. In this case, the experiment uses the generative adversarial networks principle to process pictures and videos to realize the automatic rendering of old documentary movies.
Severe class imbalance between the majority and minority classes in large datasets can prejudice Machine Learning classifiers toward the majority class. Our work uniquely consolidates two case studies, each utilizing three learners implemented within an Apache Spark framework, six sampling methods, and five sampling distribution ratios to analyze the effect of severe class imbalance on big data analytics. We use three performance metrics to evaluate this study: Area Under the Receiver Operating Characteristic Curve, Area Under the Precision-Recall Curve, and Geometric Mean. In the first case study, models were trained on one dataset (POST) and tested on another (SlowlorisBig). In the second case study, the training and testing dataset roles were switched. Our comparison of performance metrics shows that Area Under the Precision-Recall Curve and Geometric Mean are sensitive to changes in the sampling distribution ratio, whereas Area Under the Receiver Operating Characteristic Curve is relatively unaffected. In addition, we demonstrate that when comparing sampling methods, borderline-SMOTE2 outperforms the other methods in the first case study, and Random Undersampling is the top performer in the second case study.
This work takes a novel approach to classifying the behavior of devices by exploiting the single-purpose nature of IoT devices and analyzing the complexity and variance of their network traffic. We develop a formalized measurement of complexity for IoT devices, and use this measurement to precisely tune an anomaly detection algorithm for each device. We postulate that IoT devices with low complexity lead to a high confidence in their behavioral model and have a correspondingly more precise decision boundary on their predicted behavior. Conversely, complex general purpose devices have lower confidence and a more generalized decision boundary. We show that there is a positive correlation to our complexity measure and the number of outliers found by an anomaly detection algorithm. By tuning this decision boundary based on device complexity we are able to build a behavioral framework for each device that reduces false positive outliers. Finally, we propose an architecture that can use this tuned behavioral model to rank each flow on the network and calculate a trust score ranking of all traffic to and from a device which allows the network to autonomously make access control decisions on a per-flow basis.
The aim of this paper is to show the importance of Computational Stylometry (CS) and Machine Learning (ML) support in author's gender and age detection in cyberbullying texts. We developed a cyberbullying detection platform and we show the results of performances in terms of Precision, Recall and F -Measure for gender and age detection in cyberbullying texts we collected.
Load balancing and IP anycast are traffic routing algorithms used to speed up delivery of the Domain Name System. In case of a DDoS attack or an overload condition, the value of these protocols is critical, as they can provide intrinsic DDoS mitigation with the failover alternatives. In this paper, we present a methodology for predicting the next DNS response in the light of a potential redirection to less busy servers, in order to mitigate the size of the attack. Our experiments were conducted using data from the Nov. 2015 attack of the Root DNS servers and Logistic Regression, k-Nearest Neighbors, Support Vector Machines and Random Forest as our primary classifiers. The models were able to successfully predict up to 83% of responses for Root Letters that operated on a small number of sites and consequently suffered the most during the attacks. On the other hand, regarding DNS requests coming from more distributed Root servers, the models demonstrated lower accuracy. Our analysis showed a correlation between the True Positive Rate metric and the number of sites, as well as a clear need for intelligent management of traffic in load balancing practices.
We consider the possibility of detecting malicious behaviors of the advanced persistent threat (APT) at endpoints during incident response or forensics investigations. Specifically, we study the case where third-party sensors are not available; our observables are obtained solely from inherent digital artifacts of Windows operating systems. What is of particular interest is an artifact called the Application Compatibility Cache (Shimcache). As it is not apparent from the Shimcache when a file has been executed, we propose an algorithm of estimating the time of file execution up to an interval. We also show guarantees of the proposed algorithm's performance and various possible extensions that can improve the estimation. Finally, combining this approach with methods of machine learning, as well as information from other digital artifacts, we design a prototype system called XTEC and demonstrate that it can help hunt for the APT in a real-world case study.
Collaboration between humans and machines does not necessarily lead to better outcomes.
An adaptable agent-based IDS (AAIDS) inspired by the danger theory of artificial immune system is proposed. The learning mechanism of AAIDS is designed by emulating how dendritic cells (DC) in immune systems detect and classify danger signals. AG agent, DC agent and TC agent coordinate together and respond to system calls directly rather than analyze network packets. Simulations show AAIDS can determine several critical scenarios of the system behaviors where packet analysis is impractical.
In today's time Software Defined Network (SDN) gives the complete control to get the data flow in the network. SDN works as a central point to which data is administered centrally and traffic is also managed. SDN being open source product is more prone to security threats. The security policies are also to be enforced as it would otherwise let the controller be attacked the most. The attacks like DDOS and DOS attacks are more commonly found in SDN controller. DDOS is destructive attack that normally diverts the normal flow of traffic and starts the over flow of flooded packets halting the system. Machine Learning techniques helps to identify the hidden and unexpected pattern of the network and hence helps in analyzing the network flow. All the classified and unclassified techniques can help detect the malicious flow based on certain parameters like packet flow, time duration, accuracy and precision rate. Researchers have used Bayesian Network, Wavelets, Support Vector Machine and KNN to detect DDOS attacks. As per the review it's been analyzed that KNN produces better result as per the higher precision and giving a lower falser rate for detection. This paper produces better approach of hybrid Machine Learning techniques rather than existing KNN on the same data set giving more accuracy of detecting DDOS attacks on higher precision rate. The result of the traffic with both normal and abnormal behavior is shown and as per the result the proposed algorithm is designed which is suited for giving better approach than KNN and will be implemented later on for future.
Intrusion detection is one essential tool towards building secure and trustworthy Cloud computing environment, given the ubiquitous presence of cyber attacks that proliferate rapidly and morph dynamically. In our current working paradigm of resource, platform and service consolidations, Cloud Computing provides a significant improvement in the cost metrics via dynamic provisioning of IT services. Since almost all cloud computing networks lean on providing their services through Internet, they are prone to experience variety of security issues. Therefore, in cloud environments, it is necessary to deploy an Intrusion Detection System (IDS) to detect new and unknown attacks in addition to signature based known attacks, with high accuracy. In our deliberation we assume that a system or a network ``anomalous'' event is synonymous to an ``intrusion'' event when there is a significant departure in one or more underlying system or network activities. There are couple of recently proposed ideas that aim to develop a hybrid detection mechanism, combining advantages of signature-based detection schemes with the ability to detect unknown attacks based on anomalies. In this work, we propose a network based anomaly detection system at the Cloud Hypervisor level that utilizes a hybrid algorithm: a combination of K-means clustering algorithm and SVM classification algorithm, to improve the accuracy of the anomaly detection system. Dataset from UNSW-NB15 study is used to evaluate the proposed approach and results are compared with previous studies. The accuracy for our proposed K-means clustering model is slightly higher than others. However, the accuracy we obtained from the SVM model is still low for supervised techniques.
Network intrusion detection is an important component of network security. Currently, the popular detection technology used the traditional machine learning algorithms to train the intrusion samples, so as to obtain the intrusion detection model. However, these algorithms have the disadvantage of low detection rate. Deep learning is more advanced technology that automatically extracts features from samples. In view of the fact that the accuracy of intrusion detection is not high in traditional machine learning technology, this paper proposes a network intrusion detection model based on convolutional neural network algorithm. The model can automatically extract the effective features of intrusion samples, so that the intrusion samples can be accurately classified. Experimental results on KDD99 datasets show that the proposed model can greatly improve the accuracy of intrusion detection.
OS kernel is the core part of the operating system, and it plays an important role for OS resource management. A popular way to compromise OS kernel is through a kernel rootkit (i.e., malicious kernel module). Once a rootkit is loaded into the kernel space, it can carry out arbitrary malicious operations with high privilege. To defeat kernel rootkits, many approaches have been proposed in the past few years. However, existing methods suffer from some limitations: 1) most methods focus on user-mode rootkit detection; 2) some methods are limited to detect obfuscated kernel modules; and 3) some methods introduce significant performance overhead. To address these problems, we propose VKRD, a kernel rootkit detection system based on the hardware assisted virtualization technology. Compared with previous methods, VKRD can provide a transparent and an efficient execution environment for the target kernel module to reveal its run-time behavior. To select the important run-time features for training our detection models, we utilize the TF-IDF method. By combining the hardware assisted virtualization and machine learning techniques, our kernel rootkit detection solution could be potentially applied in the cloud environment. The experiments show that our system can detect windows kernel rootkits with high accuracy and moderate performance cost.
In the modern day and age, credential based authentication systems no longer provide the level of security that many organisations and their services require. The level of trust in passwords has plummeted in recent years, with waves of cyber attacks predicated on compromised and stolen credentials. This method of authentication is also heavily reliant on the individual user's choice of password. There is the potential to build levels of security on top of credential based authentication systems, using a risk based approach, which preserves the seamless authentication experience for the end user. One method of adding this security to a risk based authentication framework, is keystroke dynamics. Monitoring the behaviour of the users and how they type, produces a type of digital signature which is unique to that individual. Learning this behaviour allows dynamic flags to be applied to anomalous typing patterns that are produced by attackers using stolen credentials, as a potential risk of fraud. Methods from statistics and machine learning have been explored to try and implement such solutions. This paper will look at an Autoencoder model for learning the keystroke dynamics of specific users. The results from this paper show an improvement over the traditional tried and tested statistical approaches with an Equal Error Rate of 6.51%, with the additional benefits of relatively low training times and less reliance on feature engineering.
Phishing is the major problem of the internet era. In this era of internet the security of our data in web is gaining an increasing importance. Phishing is one of the most harmful ways to unknowingly access the credential information like username, password or account number from the users. Users are not aware of this type of attack and later they will also become a part of the phishing attacks. It may be the losses of financial found, personal information, reputation of brand name or trust of brand. So the detection of phishing site is necessary. In this paper we design a framework of phishing detection using URL.
In order to examine malicious activity that occurs in a network or a system, intrusion detection system is used. Intrusion Detection is software or a device that scans a system or a network for a distrustful activity. Due to the growing connectivity between computers, intrusion detection becomes vital to perform network security. Various machine learning techniques and statistical methodologies have been used to build different types of Intrusion Detection Systems to protect the networks. Performance of an Intrusion Detection is mainly depends on accuracy. Accuracy for Intrusion detection must be enhanced to reduce false alarms and to increase the detection rate. In order to improve the performance, different techniques have been used in recent works. Analyzing huge network traffic data is the main work of intrusion detection system. A well-organized classification methodology is required to overcome this issue. This issue is taken in proposed approach. Machine learning techniques like Support Vector Machine (SVM) and Naïve Bayes are applied. These techniques are well-known to solve the classification problems. For evaluation of intrusion detection system, NSL- KDD knowledge discovery Dataset is taken. The outcomes show that SVM works better than Naïve Bayes. To perform comparative analysis, effective classification methods like Support Vector Machine and Naive Bayes are taken, their accuracy and misclassification rate get calculated.
Malware is one of the threats to information security that continues to increase. In 2014 nearly six million new malware was recorded. The highest number of malware is in Trojan Horse malware while in Adware malware is the most significantly increased malware. Security system devices such as antivirus, firewall, and IDS signature-based are considered to fail to detect malware. This happens because of the very fast spread of computer malware and the increasing number of signatures. Besides signature-based security systems it is difficult to identify new methods, viruses or worms used by attackers. One other alternative in detecting malware is to use honeypot with machine learning. Honeypot can be used as a trap for packages that are suspected while machine learning can detect malware by classifying classes. Decision Tree and Support Vector Machine (SVM) are used as classification algorithms. In this paper, we propose architectural design as a solution to detect malware. We presented the architectural proposal and explained the experimental method to be used.
Phishing is a security attack to acquire personal information like passwords, credit card details or other account details of a user by means of websites or emails. Phishing websites look similar to the legitimate ones which make it difficult for a layman to differentiate between them. As per the reports of Anti Phishing Working Group (APWG) published in December 2018, phishing against banking services and payment processor was high. Almost all the phishy URLs use HTTPS and use redirects to avoid getting detected. This paper presents a focused literature survey of methods available to detect phishing websites. A comparative study of the in-use anti-phishing tools was accomplished and their limitations were acknowledged. We analyzed the URL-based features used in the past to improve their definitions as per the current scenario which is our major contribution. Also, a step wise procedure of designing an anti-phishing model is discussed to construct an efficient framework which adds to our contribution. Observations made out of this study are stated along with recommendations on existing systems.
Cross-Site Request Forgery (CSRF) is one of the oldest and simplest attacks on the Web, yet it is still effective on many websites and it can lead to severe consequences, such as economic losses and account takeovers. Unfortunately, tools and techniques proposed so far to identify CSRF vulnerabilities either need manual reviewing by human experts or assume the availability of the source code of the web application. In this paper we present Mitch, the first machine learning solution for the black-box detection of CSRF vulnerabilities. At the core of Mitch there is an automated detector of sensitive HTTP requests, i.e., requests which require protection against CSRF for security reasons. We trained the detector using supervised learning techniques on a dataset of 5,828 HTTP requests collected on popular websites, which we make available to other security researchers. Our solution outperforms existing detection heuristics proposed in the literature, allowing us to identify 35 new CSRF vulnerabilities on 20 major websites and 3 previously undetected CSRF vulnerabilities on production software already analyzed using a state-of-the-art tool.
Building natural and conversational virtual humans is a task of formidable complexity. We believe that, especially when building agents that affectively interact with biological humans in real-time, a cognitive science-based, multilayered sensing and artificial intelligence (AI) systems approach is needed. For this demo, we show a working version (through human interaction with it) our modular system of natural, conversation 3D virtual human using AI or sensing layers. These including sensing the human user via facial emotion recognition, voice stress, semantic meaning of the words, eye gaze, heart rate, and galvanic skin response. These inputs are combined with AI sensing and recognition of the environment using deep learning natural language captioning or dense captioning. These are all processed by our AI avatar system allowing for an affective and empathetic conversation using an NLP topic-based dialogue capable of using facial expressions, gestures, breath, eye gaze and voice language-based two-way back and forth conversations with a sensed human. Our lab has been building these systems in stages over the years.
We recently see a real digital revolution where all companies prefer to use cloud computing because of its capability to offer a simplest way to deploy the needed services. However, this digital transformation has generated different security challenges as the privacy vulnerability against cyber-attacks. In this work we will present a new architecture of a hybrid Intrusion detection System, IDS for virtual private clouds, this architecture combines both network-based and host-based intrusion detection system to overcome the limitation of each other, in case the intruder bypassed the Network-based IDS and gained access to a host, in intend to enhance security in private cloud environments. We propose to use a non-traditional mechanism in the conception of the IDS (the detection engine). Machine learning, ML algorithms will can be used to build the IDS in both parts, to detect malicious traffic in the Network-based part as an additional layer for network security, and also detect anomalies in the Host-based part to provide more privacy and confidentiality in the virtual machine. It's not in our scope to train an Artificial Neural Network ”ANN”, but just to propose a new scheme for IDS based ANN, In our future work we will present all the details related to the architecture and parameters of the ANN, as well as the results of some real experiments.