Biblio
The aim of this paper is to show the importance of Computational Stylometry (CS) and Machine Learning (ML) support in author's gender and age detection in cyberbullying texts. We developed a cyberbullying detection platform and we show the results of performances in terms of Precision, Recall and F -Measure for gender and age detection in cyberbullying texts we collected.
The greatest threat towards securing the organization and its assets are no longer the attackers attacking beyond the network walls of the organization but the insiders present within the organization with malicious intent. Existing approaches helps to monitor, detect and prevent any malicious activities within an organization's network while ignoring the human behavior impact on security. In this paper we have focused on user behavior profiling approach to monitor and analyze user behavior action sequence to detect insider threats. We present an ensemble hybrid machine learning approach using Multi State Long Short Term Memory (MSLSTM) and Convolution Neural Networks (CNN) based time series anomaly detection to detect the additive outliers in the behavior patterns based on their spatial-temporal behavior features. We find that using Multistate LSTM is better than basic single state LSTM. The proposed method with Multistate LSTM can successfully detect the insider threats providing the AUC of 0.9042 on train data and AUC of 0.9047 on test data when trained with publically available dataset for insider threats.
Recently, malicious insider attacks represent one of the most damaging threats to companies and government agencies. This paper proposes a new framework in constructing a user-centered machine learning based insider threat detection system on multiple data granularity levels. System evaluations and analysis are performed not only on individual data instances but also on normal and malicious insiders, where insider scenario specific results and delay in detection are reported and discussed. Our results show that the machine learning based detection system can learn from limited ground truth and detect new malicious insiders with a high accuracy.
With the rapidly increasing connectivity in cyberspace, Insider Threat is becoming a huge concern. Insider threat detection from system logs poses a tremendous challenge for human analysts. Analyzing log files of an organization is a key component of an insider threat detection and mitigation program. Emerging machine learning approaches show tremendous potential for performing complex and challenging data analysis tasks that would benefit the next generation of insider threat detection systems. However, with huge sets of heterogeneous data to analyze, applying machine learning techniques effectively and efficiently to such a complex problem is not straightforward. In this paper, we extract a concise set of features from the system logs while trying to prevent loss of meaningful information and providing accurate and actionable intelligence. We investigate two unsupervised anomaly detection algorithms for insider threat detection and draw a comparison between different structures of the system logs including daily dataset and periodically aggregated one. We use the generated anomaly score from the previous cycle as the trust score of each user fed to the next period's model and show its importance and impact in detecting insiders. Furthermore, we consider the psychometric score of users in our model and check its effectiveness in predicting insiders. As far as we know, our model is the first one to take the psychometric score of users into consideration for insider threat detection. Finally, we evaluate our proposed approach on CERT insider threat dataset (v4.2) and show how it outperforms previous approaches.
Intrusion detection is one essential tool towards building secure and trustworthy Cloud computing environment, given the ubiquitous presence of cyber attacks that proliferate rapidly and morph dynamically. In our current working paradigm of resource, platform and service consolidations, Cloud Computing provides a significant improvement in the cost metrics via dynamic provisioning of IT services. Since almost all cloud computing networks lean on providing their services through Internet, they are prone to experience variety of security issues. Therefore, in cloud environments, it is necessary to deploy an Intrusion Detection System (IDS) to detect new and unknown attacks in addition to signature based known attacks, with high accuracy. In our deliberation we assume that a system or a network ``anomalous'' event is synonymous to an ``intrusion'' event when there is a significant departure in one or more underlying system or network activities. There are couple of recently proposed ideas that aim to develop a hybrid detection mechanism, combining advantages of signature-based detection schemes with the ability to detect unknown attacks based on anomalies. In this work, we propose a network based anomaly detection system at the Cloud Hypervisor level that utilizes a hybrid algorithm: a combination of K-means clustering algorithm and SVM classification algorithm, to improve the accuracy of the anomaly detection system. Dataset from UNSW-NB15 study is used to evaluate the proposed approach and results are compared with previous studies. The accuracy for our proposed K-means clustering model is slightly higher than others. However, the accuracy we obtained from the SVM model is still low for supervised techniques.
The enormous growth of Internet-based traffic exposes corporate networks with a wide variety of vulnerabilities. Intrusive traffics are affecting the normal functionality of network's operation by consuming corporate resources and time. Efficient ways of identifying, protecting, and mitigating from intrusive incidents enhance productivity. As Intrusion Detection System (IDS) is hosted in the network and at the user machine level to oversee the malicious traffic in the network and at the individual computer, it is one of the critical components of a network and host security. Unsupervised anomaly traffic detection techniques are improving over time. This research aims to find an efficient classifier that detects anomaly traffic from NSL-KDD dataset with high accuracy level and minimal error rate by experimenting with five machine learning techniques. Five binary classifiers: Stochastic Gradient Decent, Random Forests, Logistic Regression, Support Vector Machine, and Sequential Model are tested and validated to produce the result. The outcome demonstrates that Random Forest Classifier outperforms the other four classifiers with and without applying the normalization process to the dataset.
In order to examine malicious activity that occurs in a network or a system, intrusion detection system is used. Intrusion Detection is software or a device that scans a system or a network for a distrustful activity. Due to the growing connectivity between computers, intrusion detection becomes vital to perform network security. Various machine learning techniques and statistical methodologies have been used to build different types of Intrusion Detection Systems to protect the networks. Performance of an Intrusion Detection is mainly depends on accuracy. Accuracy for Intrusion detection must be enhanced to reduce false alarms and to increase the detection rate. In order to improve the performance, different techniques have been used in recent works. Analyzing huge network traffic data is the main work of intrusion detection system. A well-organized classification methodology is required to overcome this issue. This issue is taken in proposed approach. Machine learning techniques like Support Vector Machine (SVM) and Naïve Bayes are applied. These techniques are well-known to solve the classification problems. For evaluation of intrusion detection system, NSL- KDD knowledge discovery Dataset is taken. The outcomes show that SVM works better than Naïve Bayes. To perform comparative analysis, effective classification methods like Support Vector Machine and Naive Bayes are taken, their accuracy and misclassification rate get calculated.
An adaptable agent-based IDS (AAIDS) inspired by the danger theory of artificial immune system is proposed. The learning mechanism of AAIDS is designed by emulating how dendritic cells (DC) in immune systems detect and classify danger signals. AG agent, DC agent and TC agent coordinate together and respond to system calls directly rather than analyze network packets. Simulations show AAIDS can determine several critical scenarios of the system behaviors where packet analysis is impractical.
Everyday., the DoS/DDoS attacks are increasing all over the world and the ways attackers are using changing continuously. This increase and variety on the attacks are affecting the governments, institutions, organizations and corporations in a bad way. Every successful attack is causing them to lose money and lose reputation in return. This paper presents an introduction to a method which can show what the attack and where the attack based on. This is tried to be achieved with using clustering algorithm DBSCAN on network traffic because of the change and variety in attack vectors.
Building natural and conversational virtual humans is a task of formidable complexity. We believe that, especially when building agents that affectively interact with biological humans in real-time, a cognitive science-based, multilayered sensing and artificial intelligence (AI) systems approach is needed. For this demo, we show a working version (through human interaction with it) our modular system of natural, conversation 3D virtual human using AI or sensing layers. These including sensing the human user via facial emotion recognition, voice stress, semantic meaning of the words, eye gaze, heart rate, and galvanic skin response. These inputs are combined with AI sensing and recognition of the environment using deep learning natural language captioning or dense captioning. These are all processed by our AI avatar system allowing for an affective and empathetic conversation using an NLP topic-based dialogue capable of using facial expressions, gestures, breath, eye gaze and voice language-based two-way back and forth conversations with a sensed human. Our lab has been building these systems in stages over the years.
An important ingredient for a successful recipe for solving machine learning problems is the availability of a suitable dataset. However, such a dataset may have to be extracted from a large unstructured and semi-structured data like programming code, scripts, and text. In this work, we propose a plug-in based, extensible feature extraction framework for which we have prototyped as a tool. The proposed framework is demonstrated by extracting features from two different sources of semi-structured and unstructured data. The semi-structured data comprised of web page and script based data whereas the other data was taken from email data for spam filtering. The usefulness of the tool was also assessed on the aspect of ease of programming.
Computational Intelligence (CI) algorithms/techniques are packaged in a variety of disparate frameworks/applications that all vary with respect to specific supported functionality and implementation decisions that drastically change performance. Developers looking to employ different CI techniques are faced with a series of trade-offs in selecting the appropriate library/framework. These include resource consumption, features, portability, interface complexity, ease of parallelization, etc. Considerations such as language compatibility and familiarity with a particular library make the choice of libraries even more difficult. The paper introduces MeetCI, an open source software framework for computational intelligence software design automation that facilitates the application design decisions and their software implementation process. MeetCI abstracts away specific framework details of CI techniques designed within a variety of libraries. This allows CI users to benefit from a variety of current frameworks without investigating the nuances of each library/framework. Using an XML file, developed in accordance with the specifications, the user can design a CI application generically, and utilize various CI software without having to redesign their entire technology stack. Switching between libraries in MeetCI is trivial and accessing the right library to satisfy a user's goals can be done easily and effectively. The paper discusses the framework's use in design of various applications. The design process is illustrated with four different examples from expert systems and machine learning domains, including the development of an expert system for security evaluation, two classification problems and a prediction problem with recurrent neural networks.
This paper investigates the use of deep reinforcement learning (DRL) in the design of a "universal" MAC protocol referred to as Deep-reinforcement Learning Multiple Access (DLMA). The design framework is partially inspired by the vision of DARPA SC2, a 3-year competition whereby competitors are to come up with a clean-slate design that "best share spectrum with any network(s), in any environment, without prior knowledge, leveraging on machine-learning technique". While the scope of DARPA SC2 is broad and involves the redesign of PHY, MAC, and Network layers, this paper's focus is narrower and only involves the MAC design. In particular, we consider the problem of sharing time slots among a multiple of time-slotted networks that adopt different MAC protocols. One of the MAC protocols is DLMA. The other two are TDMA and ALOHA. The DRL agents of DLMA do not know that the other two MAC protocols are TDMA and ALOHA. Yet, by a series of observations of the environment, its own actions, and the rewards - in accordance with the DRL algorithmic framework - a DRL agent can learn the optimal MAC strategy for harmonious co-existence with TDMA and ALOHA nodes. In particular, the use of neural networks in DRL (as opposed to traditional reinforcement learning) allows for fast convergence to optimal solutions and robustness against perturbation in hyper- parameter settings, two essential properties for practical deployment of DLMA in real wireless networks.
In last decades, the web and online services have revolutionized the modern world. However, by increasing our dependence on online services, as a result, online security threats are also increasing rapidly. One of the most common online security threats is a so-called Phishing attack, the purpose of which is to mimic a legitimate website such as online banking, e-commerce or social networking website in order to obtain sensitive data such as user-names, passwords, financial and health-related information from potential victims. The problem of detecting phishing websites has been addressed many times using various methodologies from conventional classifiers to more complex hybrid methods. Recent advancements in deep learning approaches suggested that the classification of phishing websites using deep learning neural networks should outperform the traditional machine learning algorithms. However, the results of utilizing deep neural networks heavily depend on the setting of different learning parameters. In this paper, we propose a swarm intelligence based approach to parameter setting of deep learning neural network. By applying the proposed approach to the classification of phishing websites, we were able to improve their detection when compared to existing algorithms.
Phishing e-mails are considered as spam e-mails, which aim to collect sensitive personal information about the users via network. Since the main purpose of this behavior is mostly to harm users financially, it is vital to detect these phishing or spam e-mails immediately to prevent unauthorized access to users' vital information. To detect phishing e-mails, using a quicker and robust classification method is important. Considering the billions of e-mails on the Internet, this classification process is supposed to be done in a limited time to analyze the results. In this work, we present some of the early results on the classification of spam email using deep learning and machine methods. We utilize word2vec to represent emails instead of using the popular keyword or other rule-based methods. Vector representations are then fed into a neural network to create a learning model. We have tested our method on an open dataset and found over 96% accuracy levels with the deep learning classification methods in comparison to the standard machine learning algorithms.