Visible to the public Biblio

Filters: Keyword is supervised learning  [Clear All Filters]
2020-01-27
Farag, Nadine, El-Seoud, Samir Abou, McKee, Gerard, Hassan, Ghada.  2019.  Bullying Hurts: A Survey on Non-Supervised Techniques for Cyber-Bullying Detection. Proceedings of the 2019 8th International Conference on Software and Information Engineering. :85–90.
The contemporary period is scarred by the predominant place of social media in everyday life. Despite social media being a useful tool for communication and social gathering it also offers opportunities for harmful criminal activities. One of these activities is cyber-bullying enabled through the abuse and mistreatment of the internet as a means of bullying others virtually. As a way of minimising this occurrence, research into computer-based researched is carried out to detect cyber-bullying by the scientific research community. An extensive literature search shows that supervised learning techniques are the most commonly used methods for cyber-bullying detection. However, some non-supervised techniques and other approaches have proven to be effective towards cyber-bullying detection. This paper, therefore, surveys recent research on non-supervised techniques and offers some suggestions for future research in textual-based cyber-bullying detection including detecting roles, detecting emotional state, automated annotation and stylometric methods.
2019-12-30
Heydari, Mohammad, Mylonas, Alexios, Katos, Vasilios, Balaguer-Ballester, Emili, Tafreshi, Vahid Heydari Fami, Benkhelifa, Elhadj.  2019.  Uncertainty-Aware Authentication Model for Fog Computing in IoT. 2019 Fourth International Conference on Fog and Mobile Edge Computing (FMEC). :52–59.

Since the term “Fog Computing” has been coined by Cisco Systems in 2012, security and privacy issues of this promising paradigm are still open challenges. Among various security challenges, Access Control is a crucial concern for all cloud computing-like systems (e.g. Fog computing, Mobile edge computing) in the IoT era. Therefore, assigning the precise level of access in such an inherently scalable, heterogeneous and dynamic environment is not easy to perform. This work defines the uncertainty challenge for authentication phase of the access control in fog computing because on one hand fog has a number of characteristics that amplify uncertainty in authentication and on the other hand applying traditional access control models does not result in a flexible and resilient solution. Therefore, we have proposed a novel prediction model based on the extension of Attribute Based Access Control (ABAC) model. Our data-driven model is able to handle uncertainty in authentication. It is also able to consider the mobility of mobile edge devices in order to handle authentication. In doing so, we have built our model using and comparing four supervised classification algorithms namely as Decision Tree, Naïve Bayes, Logistic Regression and Support Vector Machine. Our model can achieve authentication performance with 88.14% accuracy using Logistic Regression.

2019-11-26
Hassanpour, Reza, Dogdu, Erdogan, Choupani, Roya, Goker, Onur, Nazli, Nazli.  2018.  Phishing E-Mail Detection by Using Deep Learning Algorithms. Proceedings of the ACMSE 2018 Conference. :45:1-45:1.

Phishing e-mails are considered as spam e-mails, which aim to collect sensitive personal information about the users via network. Since the main purpose of this behavior is mostly to harm users financially, it is vital to detect these phishing or spam e-mails immediately to prevent unauthorized access to users' vital information. To detect phishing e-mails, using a quicker and robust classification method is important. Considering the billions of e-mails on the Internet, this classification process is supposed to be done in a limited time to analyze the results. In this work, we present some of the early results on the classification of spam email using deep learning and machine methods. We utilize word2vec to represent emails instead of using the popular keyword or other rule-based methods. Vector representations are then fed into a neural network to create a learning model. We have tested our method on an open dataset and found over 96% accuracy levels with the deep learning classification methods in comparison to the standard machine learning algorithms.

2018-11-19
Zhao, Zhi-Lin, Wang, Chang-Dong, Lin, Kun-Yu, Lai, Jian-Huang.  2017.  Missing Value Learning. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. :2427–2430.

Missing value is common in many machine learning problems and much effort has been made to handle missing data to improve the performance of the learned model. Sometimes, our task is not to train a model using those unlabeled/labeled data with missing value but process examples according to the values of some specified features. So, there is an urgent need of developing a method to predict those missing values. In this paper, we focus on learning from the known values to learn missing value as close as possible to the true one. It's difficult for us to predict missing value because we do not know the structure of the data matrix and some missing values may relate to some other missing values. We solve the problem by recovering the complete data matrix under the three reasonable constraints: feature relationship, upper recovery error bound and class relationship. The proposed algorithm can deal with both unlabeled and labeled data and generative adversarial idea will be used in labeled data to transfer knowledge. Extensive experiments have been conducted to show the effectiveness of the proposed algorithms.

2018-08-23
Halawa, Hassan, Ripeanu, Matei, Beznosov, Konstantin, Coskun, Baris, Liu, Meizhu.  2017.  An Early Warning System for Suspicious Accounts. Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. :51–52.
In the face of large-scale automated cyber-attacks to large online services, fast detection and remediation of compromised accounts are crucial to limit the spread of new attacks and to mitigate the overall damage to users, companies, and the public at large. We advocate a fully automated approach based on machine learning to enable large-scale online service providers to quickly identify potentially compromised accounts. We develop an early warning system for the detection of suspicious account activity with the goal of quick identification and remediation of compromised accounts. We demonstrate the feasibility and applicability of our proposed system in a four month experiment at a large-scale online service provider using real-world production data encompassing hundreds of millions of users. We show that - even using only login data, features with low computational cost, and a basic model selection approach - around one out of five accounts later flagged as suspicious are correctly predicted a month in advance based on one week's worth of their login activity.
2018-06-07
Uwagbole, S. O., Buchanan, W. J., Fan, L..  2017.  An applied pattern-driven corpus to predictive analytics in mitigating SQL injection attack. 2017 Seventh International Conference on Emerging Security Technologies (EST). :12–17.

Emerging computing relies heavily on secure backend storage for the massive size of big data originating from the Internet of Things (IoT) smart devices to the Cloud-hosted web applications. Structured Query Language (SQL) Injection Attack (SQLIA) remains an intruder's exploit of choice to pilfer confidential data from the back-end database with damaging ramifications. The existing approaches were all before the new emerging computing in the context of the Internet big data mining and as such will lack the ability to cope with new signatures concealed in a large volume of web requests over time. Also, these existing approaches were strings lookup approaches aimed at on-premise application domain boundary, not applicable to roaming Cloud-hosted services' edge Software-Defined Network (SDN) to application endpoints with large web request hits. Using a Machine Learning (ML) approach provides scalable big data mining for SQLIA detection and prevention. Unfortunately, the absence of corpus to train a classifier is an issue well known in SQLIA research in applying Artificial Intelligence (AI) techniques. This paper presents an application context pattern-driven corpus to train a supervised learning model. The model is trained with ML algorithms of Two-Class Support Vector Machine (TC SVM) and Two-Class Logistic Regression (TC LR) implemented on Microsoft Azure Machine Learning (MAML) studio to mitigate SQLIA. This scheme presented here, then forms the subject of the empirical evaluation in Receiver Operating Characteristic (ROC) curve.

Araújo, D. R. B., Barros, G. H. P. S. de, Bastos-Filho, C. J. A., Martins-Filho, J. F..  2017.  Surrogate models assisted by neural networks to assess the resilience of networks. 2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI). :1–6.

The assessment of networks is frequently accomplished by using time-consuming analysis tools based on simulations. For example, the blocking probability of networks can be estimated by Monte Carlo simulations and the network resilience can be assessed by link or node failure simulations. We propose in this paper to use Artificial Neural Networks (ANN) to predict the robustness of networks based on simple topological metrics to avoid time-consuming failure simulations. We accomplish the training process using supervised learning based on a historical database of networks. We compare the results of our proposal with the outcome provided by targeted and random failures simulations. We show that our approach is faster than failure simulators and the ANN can mimic the same robustness evaluation provide by these simulators. We obtained an average speedup of 300 times.

2018-03-19
Ditzler, G., Prater, A..  2017.  Fine Tuning Lasso in an Adversarial Environment against Gradient Attacks. 2017 IEEE Symposium Series on Computational Intelligence (SSCI). :1–7.

Machine learning and data mining algorithms typically assume that the training and testing data are sampled from the same fixed probability distribution; however, this violation is often violated in practice. The field of domain adaptation addresses the situation where this assumption of a fixed probability between the two domains is violated; however, the difference between the two domains (training/source and testing/target) may not be known a priori. There has been a recent thrust in addressing the problem of learning in the presence of an adversary, which we formulate as a problem of domain adaption to build a more robust classifier. This is because the overall security of classifiers and their preprocessing stages have been called into question with the recent findings of adversaries in a learning setting. Adversarial training (and testing) data pose a serious threat to scenarios where an attacker has the opportunity to ``poison'' the training or ``evade'' on the testing data set(s) in order to achieve something that is not in the best interest of the classifier. Recent work has begun to show the impact of adversarial data on several classifiers; however, the impact of the adversary on aspects related to preprocessing of data (i.e., dimensionality reduction or feature selection) has widely been ignored in the revamp of adversarial learning research. Furthermore, variable selection, which is a vital component to any data analysis, has been shown to be particularly susceptible under an attacker that has knowledge of the task. In this work, we explore avenues for learning resilient classification models in the adversarial learning setting by considering the effects of adversarial data and how to mitigate its effects through optimization. Our model forms a single convex optimization problem that uses the labeled training data from the source domain and known- weaknesses of the model for an adversarial component. We benchmark the proposed approach on synthetic data and show the trade-off between classification accuracy and skew-insensitive statistics.

2018-03-05
Bhattacharjee, Shameek, Thakur, Aditya, Silvestri, Simone, Das, Sajal K..  2017.  Statistical Security Incident Forensics Against Data Falsification in Smart Grid Advanced Metering Infrastructure. Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy. :35–45.

Compromised smart meters reporting false power consumption data in Advanced Metering Infrastructure (AMI) may have drastic consequences on a smart grid's operations. Most existing works only deal with electricity theft from customers. However, several other types of data falsification attacks are possible, when meters are compromised by organized rivals. In this paper, we first propose a taxonomy of possible data falsification strategies such as additive, deductive, camouflage and conflict, in AMI micro-grids. Then, we devise a statistical anomaly detection technique to identify the incidence of proposed attack types, by studying their impact on the observed data. Subsequently, a trust model based on Kullback-Leibler divergence is proposed to identify compromised smart meters for additive and deductive attacks. The resultant detection rates and false alarms are minimized through a robust aggregate measure that is calculated based on the detected attack type and successfully discriminating legitimate changes from malicious ones. For conflict and camouflage attacks, a generalized linear model and Weibull function based kernel trick is used over the trust score to facilitate more accurate classification. Using real data sets collected from AMI, we investigate several trade-offs that occur between attacker's revenue and costs, as well as the margin of false data and fraction of compromised nodes. Experimental results show that our model has a high true positive detection rate, while the average false alarm rate is just 8%, for most practical attack strategies, without depending on the expensive hardware based monitoring.

Yin, H. Sun, Vatrapu, R..  2017.  A First Estimation of the Proportion of Cybercriminal Entities in the Bitcoin Ecosystem Using Supervised Machine Learning. 2017 IEEE International Conference on Big Data (Big Data). :3690–3699.

Bitcoin, a peer-to-peer payment system and digital currency, is often involved in illicit activities such as scamming, ransomware attacks, illegal goods trading, and thievery. At the time of writing, the Bitcoin ecosystem has not yet been mapped and as such there is no estimate of the share of illicit activities. This paper provides the first estimation of the portion of cyber-criminal entities in the Bitcoin ecosystem. Our dataset consists of 854 observations categorised into 12 classes (out of which 5 are cybercrime-related) and a total of 100,000 uncategorised observations. The dataset was obtained from the data provider who applied three types of clustering of Bitcoin transactions to categorise entities: co-spend, intelligence-based, and behaviour-based. Thirteen supervised learning classifiers were then tested, of which four prevailed with a cross-validation accuracy of 77.38%, 76.47%, 78.46%, 80.76% respectively. From the top four classifiers, Bagging and Gradient Boosting classifiers were selected based on their weighted average and per class precision on the cybercrime-related categories. Both models were used to classify 100,000 uncategorised entities, showing that the share of cybercrime-related is 29.81% according to Bagging, and 10.95% according to Gradient Boosting with number of entities as the metric. With regard to the number of addresses and current coins held by this type of entities, the results are: 5.79% and 10.02% according to Bagging; and 3.16% and 1.45% according to Gradient Boosting.

2018-01-23
Deb, Supratim, Ge, Zihui, Isukapalli, Sastry, Puthenpura, Sarat, Venkataraman, Shobha, Yan, He, Yates, Jennifer.  2017.  AESOP: Automatic Policy Learning for Predicting and Mitigating Network Service Impairments. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. :1783–1792.

Efficient management and control of modern and next-gen networks is of paramount importance as networks have to maintain highly reliable service quality whilst supporting rapid growth in traffic demand and new application services. Rapid mitigation of network service degradations is a key factor in delivering high service quality. Automation is vital to achieving rapid mitigation of issues, particularly at the network edge where the scale and diversity is the greatest. This automation involves the rapid detection, localization and (where possible) repair of service-impacting faults and performance impairments. However, the most significant challenge here is knowing what events to detect, how to correlate events to localize an issue and what mitigation actions should be performed in response to the identified issues. These are defined as policies to systems such as ECOMP. In this paper, we present AESOP, a data-driven intelligent system to facilitate automatic learning of policies and rules for triggering remedial actions in networks. AESOP combines best operational practices (domain knowledge) with a variety of measurement data to learn and validate operational policies to mitigate service issues in networks. AESOP's design addresses the following key challenges: (i) learning from high-dimensional noisy data, (ii) capturing multiple fault models, (iii) modeling the high service-cost of false positives, and (iv) accounting for the evolving network infrastructure. We present the design of our system and show results from our ongoing experiments to show the effectiveness of our policy leaning framework.