Biblio

List
Filter

Found 245 results

Filters: Keyword is pattern classification [Clear All Filters]

2018-06-20

Lee, Y., Choi, S. S., Choi, J., Song, J.. 2017. A Lightweight Malware Classification Method Based on Detection Results of Anti-Virus Software. 2017 12th Asia Joint Conference on Information Security (AsiaJCIS). :5–9.

With the development of cyber threats on the Internet, the number of malware, especially unknown malware, is also dramatically increasing. Since all of malware cannot be analyzed by analysts, it is very important to find out new malware that should be analyzed by them. In order to cope with this issue, the existing approaches focused on malware classification using static or dynamic analysis results of malware. However, the static and the dynamic analyses themselves are also too costly and not easy to build the isolated, secure and Internet-like analysis environments such as sandbox. In this paper, we propose a lightweight malware classification method based on detection results of anti-virus software. Since the proposed method can reduce the volume of malware that should be analyzed by analysts, it can be used as a preprocess for in-depth analysis of malware. The experimental showed that the proposed method succeeded in classification of 1,000 malware samples into 187 unique groups. This means that 81% of the original malware samples do not need to analyze by analysts.

Chakraborty, S., Stokes, J. W., Xiao, L., Zhou, D., Marinescu, M., Thomas, A.. 2017. Hierarchical learning for automated malware classification. MILCOM 2017 - 2017 IEEE Military Communications Conference (MILCOM). :23–28.

Despite widespread use of commercial anti-virus products, the number of malicious files detected on home and corporate computers continues to increase at a significant rate. Recently, anti-virus companies have started investing in machine learning solutions to augment signatures manually designed by analysts. A malicious file's determination is often represented as a hierarchical structure consisting of a type (e.g. Worm, Backdoor), a platform (e.g. Win32, Win64), a family (e.g. Rbot, Rugrat) and a family variant (e.g. A, B). While there has been substantial research in automated malware classification, the aforementioned hierarchical structure, which can provide additional information to the classification models, has been ignored. In this paper, we propose the novel idea and study the performance of employing hierarchical learning algorithms for automated classification of malicious files. To the best of our knowledge, this is the first research effort which incorporates the hierarchical structure of the malware label in its automated classification and in the security domain, in general. It is important to note that our method does not require any additional effort by analysts because they typically assign these hierarchical labels today. Our empirical results on a real world, industrial-scale malware dataset of 3.6 million files demonstrate that incorporation of the label hierarchy achieves a significant reduction of 33.1% in the binary error rate as compared to a non-hierarchical classifier which is traditionally used in such problems.

Luo, J. S., Lo, D. C. T.. 2017. Binary malware image classification using machine learning with local binary pattern. 2017 IEEE International Conference on Big Data (Big Data). :4664–4667.

Malware classification is a critical part in the cyber-security. Traditional methodologies for the malware classification typically use static analysis and dynamic analysis to identify malware. In this paper, a malware classification methodology based on its binary image and extracting local binary pattern (LBP) features is proposed. First, malware images are reorganized into 3 by 3 grids which is mainly used to extract LBP feature. Second, the LBP is implemented on the malware images to extract features in that it is useful in pattern or texture classification. Finally, Tensorflow, a library for machine learning, is applied to classify malware images with the LBP feature. Performance comparison results among different classifiers with different image descriptors such as GIST, a spatial envelop, and the LBP demonstrate that our proposed approach outperforms others.

Kebede, T. M., Djaneye-Boundjou, O., Narayanan, B. N., Ralescu, A., Kapp, D.. 2017. Classification of Malware programs using autoencoders based deep learning architecture and its application to the microsoft malware Classification challenge (BIG 2015) dataset. 2017 IEEE National Aerospace and Electronics Conference (NAECON). :70–75.

Distinguishing and classifying different types of malware is important to better understanding how they can infect computers and devices, the threat level they pose and how to protect against them. In this paper, a system for classifying malware programs is presented. The paper describes the architecture of the system and assesses its performance on a publicly available database (provided by Microsoft for the Microsoft Malware Classification Challenge BIG2015) to serve as a benchmark for future research efforts. First, the malicious programs are preprocessed such that they are visualized as gray scale images. We then make use of an architecture comprised of multiple layers (multiple levels of encoding) to carry out the classification process of those images/programs. We compare the performance of this approach against traditional machine learning and pattern recognition algorithms. Our experimental results show that the deep learning architecture yields a boost in performance over those conventional/standard algorithms. A hold-out validation analysis using the superior architecture shows an accuracy in the order of 99.15%.

Hassen, M., Carvalho, M. M., Chan, P. K.. 2017. Malware classification using static analysis based features. 2017 IEEE Symposium Series on Computational Intelligence (SSCI). :1–7.

Anti-virus vendors receive hundreds of thousands of malware to be analysed each day. Some are new malware while others are variations or evolutions of existing malware. Because analyzing each malware sample by hand is impossible, automated techniques to analyse and categorize incoming samples are needed. In this work, we explore various machine learning features extracted from malware samples through static analysis for classification of malware binaries into already known malware families. We present a new feature based on control statement shingling that has a comparable accuracy to ordinary opcode n-gram based features while requiring smaller dimensions. This, in turn, results in a shorter training time.

Pranamulia, R., Asnar, Y., Perdana, R. S.. 2017. Profile hidden Markov model for malware classification \#x2014; usage of system call sequence for malware classification. 2017 International Conference on Data and Software Engineering (ICoDSE). :1–5.

Malware technology makes it difficult for malware analyst to detect same malware files with different obfuscation technique. In this paper we are trying to tackle that problem by analyzing the sequence of system call from an executable file. Malware files which actually are the same should have almost identical or at least a similar sequence of system calls. In this paper, we are going to create a model for each malware class consists of malwares from different families based on its sequence of system calls. Method/algorithm that's used in this paper is profile hidden markov model which is a very well-known tool in the biological informatics field for comparing DNA and protein sequences. Malware classes that we are going to build are trojan and worm class. Accuracy for these classes are pretty high, it's above 90% with also a high false positive rate around 37%.

2018-06-11

Kwon, H., Harris, W., Esmaeilzadeh, H.. 2017. Proving Flow Security of Sequential Logic via Automatically-Synthesized Relational Invariants. 2017 IEEE 30th Computer Security Foundations Symposium (CSF). :420–435.

Due to the proliferation of reprogrammable hardware, core designs built from modules drawn from a variety of sources execute with direct access to critical system resources. Expressing guarantees that such modules satisfy, in particular the dynamic conditions under which they release information about their unbounded streams of inputs, and automatically proving that they satisfy such guarantees, is an open and critical problem.,,To address these challenges, we propose a domain-specific language, named STREAMS, for expressing information-flow policies with declassification over unbounded input streams. We also introduce a novel algorithm, named SIMAREL, that given a core design C and STREAMS policy P, automatically proves or falsifies that C satisfies P. The key technical insight behind the design of SIMAREL is a novel algorithm for efficiently synthesizing relational invariants over pairs of circuit executions.,,We expressed expected behavior of cores designed independently for research and production as STREAMS policies and used SIMAREL to check if each core satisfies its policy. SIMAREL proved that half of the cores satisfied expected behavior, but found unexpected information leaks in six open-source designs: an Ethernet controller, a flash memory controller, an SD-card storage manager, a robotics controller, a digital-signal processing (DSP) module, and a debugging interface.

2018-06-07

Uwagbole, S. O., Buchanan, W. J., Fan, L.. 2017. An applied pattern-driven corpus to predictive analytics in mitigating SQL injection attack. 2017 Seventh International Conference on Emerging Security Technologies (EST). :12–17.

Emerging computing relies heavily on secure backend storage for the massive size of big data originating from the Internet of Things (IoT) smart devices to the Cloud-hosted web applications. Structured Query Language (SQL) Injection Attack (SQLIA) remains an intruder's exploit of choice to pilfer confidential data from the back-end database with damaging ramifications. The existing approaches were all before the new emerging computing in the context of the Internet big data mining and as such will lack the ability to cope with new signatures concealed in a large volume of web requests over time. Also, these existing approaches were strings lookup approaches aimed at on-premise application domain boundary, not applicable to roaming Cloud-hosted services' edge Software-Defined Network (SDN) to application endpoints with large web request hits. Using a Machine Learning (ML) approach provides scalable big data mining for SQLIA detection and prevention. Unfortunately, the absence of corpus to train a classifier is an issue well known in SQLIA research in applying Artificial Intelligence (AI) techniques. This paper presents an application context pattern-driven corpus to train a supervised learning model. The model is trained with ML algorithms of Two-Class Support Vector Machine (TC SVM) and Two-Class Logistic Regression (TC LR) implemented on Microsoft Azure Machine Learning (MAML) studio to mitigate SQLIA. This scheme presented here, then forms the subject of the empirical evaluation in Receiver Operating Characteristic (ROC) curve.

Dikhit, A. S., Karodiya, K.. 2017. Result evaluation of field authentication based SQL injection and XSS attack exposure. 2017 International Conference on Information, Communication, Instrumentation and Control (ICICIC). :1–6.

Figuring innovations and development of web diminishes the exertion required for different procedures. Among them the most profited businesses are electronic frameworks, managing an account, showcasing, web based business and so on. This framework mostly includes the data trades ceaselessly starting with one host then onto the next. Amid this move there are such a variety of spots where the secrecy of the information and client gets loosed. Ordinarily the zone where there is greater likelihood of assault event is known as defenceless zones. Electronic framework association is one of such place where numerous clients performs there undertaking as indicated by the benefits allotted to them by the director. Here the aggressor makes the utilization of open ranges, for example, login or some different spots from where the noxious script is embedded into the framework. This scripts points towards trading off the security imperatives intended for the framework. Few of them identified with clients embedded scripts towards web communications are SQL infusion and cross webpage scripting (XSS). Such assaults must be distinguished and evacuated before they have an effect on the security and classification of the information. Amid the most recent couple of years different arrangements have been incorporated to the framework for making such security issues settled on time. Input approvals is one of the notable fields however experiences the issue of execution drops and constrained coordinating. Some other component, for example, disinfection and polluting will create high false report demonstrating the misclassified designs. At the center, both include string assessment and change investigation towards un-trusted hotspots for totally deciphering the effect and profundity of the assault. This work proposes an enhanced lead based assault discovery with specifically message fields for viably identifying the malevolent scripts. The work obstructs the ordinary access for malignant so- rce utilizing and hearty manage coordinating through unified vault which routinely gets refreshed. At the underlying level of assessment, the work appears to give a solid base to further research.

2018-05-09

Dali, L., Mivule, K., El-Sayed, H.. 2017. A heuristic attack detection approach using the \#x201C;least weighted \#x201D; attributes for cyber security data. 2017 Intelligent Systems Conference (IntelliSys). :1067–1073.

The continuous advance in recent cloud-based computer networks has generated a number of security challenges associated with intrusions in network systems. With the exponential increase in the volume of network traffic data, involvement of humans in such detection systems is time consuming and a non-trivial problem. Secondly, network traffic data tends to be highly dimensional, comprising of numerous features and attributes, making classification challenging and thus susceptible to the curse of dimensionality problem. Given such scenarios, the need arises for dimensional reduction, feature selection, combined with machine-learning techniques in the classification of such data. Therefore, as a contribution, this paper seeks to employ data mining techniques in a cloud-based environment, by selecting appropriate attributes and features with the least importance in terms of weight for the classification. Often the standard is to select features with better weights while ignoring those with least weights. In this study, we seek to find out if we can make prediction using those features with least weights. The motivation is that adversaries use stealth to hide their activities from the obvious. The question then is, can we predict any stealth activity of an adversary using the least observed attributes? In this particular study, we employ information gain to select attributes with the lowest weights and then apply machine learning to classify if a combination, in this case, of both source and destination ports are attacked or not. The motivation of this investigation is if attributes that are of least importance can be used to predict if an attack could occur. Our preliminary results show that even when the source and destination port attributes are used in combination with features with the least weights, it is possible to classify such network traffic data and predict if an attack will occur or not.

Barenghi, A., Mainardi, N., Pelosi, G.. 2017. A Security Audit of the OpenPGP Format. 2017 14th International Symposium on Pervasive Systems, Algorithms and Networks 2017 11th International Conference on Frontier of Computer Science and Technology 2017 Third International Symposium of Creative Computing (ISPAN-FCST-ISCC). :336–343.

For over two decades the OpenPGP format has provided the mainstay of email confidentiality and authenticity, and is currently being relied upon to provide authenticated package distributions in open source Unix systems. In this work, we provide the first language theoretical analysis of the OpenPGP format, classifying it as a deterministic context free language and establishing that an automatically generated parser can in principle be defined. However, we show that the number of rules required to describe it with a deterministic context free grammar is prohibitively high, and we identify security vulnerabilities in the OpenPGP format specification. We identify possible attacks aimed at tampering with messages and certificates while retaining their syntactical and semantical validity. We evaluate the effectiveness of these attacks against the two OpenPGP implementations covering the overwhelming majority of uses, i.e., the GNU Privacy Guard (GPG) and Symantec PGP. The results of the evaluation show that both implementations turn out not to be vulnerable due to conser- vative choices in dealing with malicious input data. Finally, we provide guidelines to improve the OpenPGP specification

2018-05-02

Gu, P., Khatoun, R., Begriche, Y., Serhrouchni, A.. 2017. k-Nearest Neighbours classification based Sybil attack detection in Vehicular networks. 2017 Third International Conference on Mobile and Secure Services (MobiSecServ). :1–6.

In Vehicular networks, privacy, especially the vehicles' location privacy is highly concerned. Several pseudonymous based privacy protection mechanisms have been established and standardized in the past few years by IEEE and ETSI. However, vehicular networks are still vulnerable to Sybil attack. In this paper, a Sybil attack detection method based on k-Nearest Neighbours (kNN) classification algorithm is proposed. In this method, vehicles are classified based on the similarity in their driving patterns. Furthermore, the kNN methods' high runtime complexity issue is also optimized. The simulation results show that our detection method can reach a high detection rate while keeping error rate low.

2018-05-01

Xie, T., Zhou, Q., Hu, J., Shu, L., Jiang, P.. 2017. A Sequential Multi-Objective Robust Optimization Approach under Interval Uncertainty Based on Support Vector Machines. 2017 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM). :2088–2092.

Interval uncertainty can cause uncontrollable variations in the objective and constraint values, which could seriously deteriorate the performance or even change the feasibility of the optimal solutions. Robust optimization is to obtain solutions that are optimal and minimally sensitive to uncertainty. In this paper, a sequential multi-objective robust optimization (MORO) approach based on support vector machines (SVM) is proposed. Firstly, a sequential optimization structure is adopted to ease the computational burden. Secondly, SVM is used to construct a classification model to classify design alternatives into feasible or infeasible. The proposed approach is tested on a numerical example and an engineering case. Results illustrate that the proposed approach can reasonably approximate solutions obtained from the existing sequential MORO approach (SMORO), while the computational costs are significantly reduced compared with those of SMORO.

Wang, X., Zhou, S.. 2017. Accelerated Stochastic Gradient Method for Support Vector Machines Classification with Additive Kernel. 2017 First International Conference on Electronics Instrumentation Information Systems (EIIS). :1–6.

Support vector machines (SVMs) have been widely used for classification in machine learning and data mining. However, SVM faces a huge challenge in large scale classification tasks. Recent progresses have enabled additive kernel version of SVM efficiently solves such large scale problems nearly as fast as a linear classifier. This paper proposes a new accelerated mini-batch stochastic gradient descent algorithm for SVM classification with additive kernel (AK-ASGD). On the one hand, the gradient is approximated by the sum of a scalar polynomial function for each feature dimension; on the other hand, Nesterov's acceleration strategy is used. The experimental results on benchmark large scale classification data sets show that our proposed algorithm can achieve higher testing accuracies and has faster convergence rate.

2018-04-11

Deliu, I., Leichter, C., Franke, K.. 2017. Extracting Cyber Threat Intelligence from Hacker Forums: Support Vector Machines versus Convolutional Neural Networks. 2017 IEEE International Conference on Big Data (Big Data). :3648–3656.

Hacker forums and other social platforms may contain vital information about cyber security threats. But using manual analysis to extract relevant threat information from these sources is a time consuming and error-prone process that requires a significant allocation of resources. In this paper, we explore the potential of Machine Learning methods to rapidly sift through hacker forums for relevant threat intelligence. Utilizing text data from a real hacker forum, we compared the text classification performance of Convolutional Neural Network methods against more traditional Machine Learning approaches. We found that traditional machine learning methods, such as Support Vector Machines, can yield high levels of performance that are on par with Convolutional Neural Network algorithms.

2018-04-02

Al-Zewairi, M., Almajali, S., Awajan, A.. 2017. Experimental Evaluation of a Multi-Layer Feed-Forward Artificial Neural Network Classifier for Network Intrusion Detection System. 2017 International Conference on New Trends in Computing Sciences (ICTCS). :167–172.

Deep Learning has been proven more effective than conventional machine-learning algorithms in solving classification problem with high dimensionality and complex features, especially when trained with big data. In this paper, a deep learning binomial classifier for Network Intrusion Detection System is proposed and experimentally evaluated using the UNSW-NB15 dataset. Three different experiments were executed in order to determine the optimal activation function, then to select the most important features and finally to test the proposed model on unseen data. The evaluation results demonstrate that the proposed classifier outperforms other models in the literature with 98.99% accuracy and 0.56% false alarm rate on unseen data.

Yousefi-Azar, M., Varadharajan, V., Hamey, L., Tupakula, U.. 2017. Autoencoder-Based Feature Learning for Cyber Security Applications. 2017 International Joint Conference on Neural Networks (IJCNN). :3854–3861.

This paper presents a novel feature learning model for cyber security tasks. We propose to use Auto-encoders (AEs), as a generative model, to learn latent representation of different feature sets. We show how well the AE is capable of automatically learning a reasonable notion of semantic similarity among input features. Specifically, the AE accepts a feature vector, obtained from cyber security phenomena, and extracts a code vector that captures the semantic similarity between the feature vectors. This similarity is embedded in an abstract latent representation. Because the AE is trained in an unsupervised fashion, the main part of this success comes from appropriate original feature set that is used in this paper. It can also provide more discriminative features in contrast to other feature engineering approaches. Furthermore, the scheme can reduce the dimensionality of the features thereby signicantly minimising the memory requirements. We selected two different cyber security tasks: networkbased anomaly intrusion detection and Malware classication. We have analysed the proposed scheme with various classifiers using publicly available datasets for network anomaly intrusion detection and malware classifications. Several appropriate evaluation metrics show improvement compared to prior results.

Gao, F.. 2017. Application of Generalized Regression Neural Network in Cloud Security Intrusion Detection. 2017 International Conference on Robots Intelligent System (ICRIS). :54–57.

By using generalized regression neural network clustering analysis, effective clustering of five kinds of network intrusion behavior modes is carried out. First of all, intrusion data is divided into five categories by making use of fuzzy C means clustering algorithm. Then, the samples that are closet to the center of each class in the clustering results are taken as the clustering training samples of generalized neural network for the data training, and the results output by the training are the individual owned invasion category. The experimental results showed that the new algorithm has higher classification accuracy of network intrusion ways, which can provide more reliable data support for the prevention of the network intrusion.

Yusof, M., Saudi, M. M., Ridzuan, F.. 2017. A New Mobile Botnet Classification Based on Permission and API Calls. 2017 Seventh International Conference on Emerging Security Technologies (EST). :122–127.

Currently, mobile botnet attacks have shifted from computers to smartphones due to its functionality, ease to exploit, and based on financial intention. Mostly, it attacks Android due to its popularity and high usage among end users. Every day, more and more malicious mobile applications (apps) with the botnet capability have been developed to exploit end users' smartphones. Therefore, this paper presents a new mobile botnet classification based on permission and Application Programming Interface (API) calls in the smartphone. This classification is developed using static analysis in a controlled lab environment and the Drebin dataset is used as the training dataset. 800 apps from the Google Play Store have been chosen randomly to test the proposed classification. As a result, 16 permissions and 31 API calls that are most related with mobile botnet have been extracted using feature selection and later classified and tested using machine learning algorithms. The experimental result shows that the Random Forest Algorithm has achieved the highest detection accuracy of 99.4% with the lowest false positive rate of 16.1% as compared to other machine learning algorithms. This new classification can be used as the input for mobile botnet detection for future work, especially for financial matters.

2018-03-26

Pallaprolu, S. C., Sankineni, R., Thevar, M., Karabatis, G., Wang, J.. 2017. Zero-Day Attack Identification in Streaming Data Using Semantics and Spark. 2017 IEEE International Congress on Big Data (BigData Congress). :121–128.

Intrusion Detection Systems (IDS) have been in existence for many years now, but they fall short in efficiently detecting zero-day attacks. This paper presents an organic combination of Semantic Link Networks (SLN) and dynamic semantic graph generation for the on the fly discovery of zero-day attacks using the Spark Streaming platform for parallel detection. In addition, a minimum redundancy maximum relevance (MRMR) feature selection algorithm is deployed to determine the most discriminating features of the dataset. Compared to previous studies on zero-day attack identification, the described method yields better results due to the semantic learning and reasoning on top of the training data and due to the use of collaborative classification methods. We also verified the scalability of our method in a distributed environment.

2018-03-19

Ditzler, G., Prater, A.. 2017. Fine Tuning Lasso in an Adversarial Environment against Gradient Attacks. 2017 IEEE Symposium Series on Computational Intelligence (SSCI). :1–7.

Machine learning and data mining algorithms typically assume that the training and testing data are sampled from the same fixed probability distribution; however, this violation is often violated in practice. The field of domain adaptation addresses the situation where this assumption of a fixed probability between the two domains is violated; however, the difference between the two domains (training/source and testing/target) may not be known a priori. There has been a recent thrust in addressing the problem of learning in the presence of an adversary, which we formulate as a problem of domain adaption to build a more robust classifier. This is because the overall security of classifiers and their preprocessing stages have been called into question with the recent findings of adversaries in a learning setting. Adversarial training (and testing) data pose a serious threat to scenarios where an attacker has the opportunity to ``poison'' the training or ``evade'' on the testing data set(s) in order to achieve something that is not in the best interest of the classifier. Recent work has begun to show the impact of adversarial data on several classifiers; however, the impact of the adversary on aspects related to preprocessing of data (i.e., dimensionality reduction or feature selection) has widely been ignored in the revamp of adversarial learning research. Furthermore, variable selection, which is a vital component to any data analysis, has been shown to be particularly susceptible under an attacker that has knowledge of the task. In this work, we explore avenues for learning resilient classification models in the adversarial learning setting by considering the effects of adversarial data and how to mitigate its effects through optimization. Our model forms a single convex optimization problem that uses the labeled training data from the source domain and known- weaknesses of the model for an adversarial component. We benchmark the proposed approach on synthetic data and show the trade-off between classification accuracy and skew-insensitive statistics.

McLaren, P., Russell, G., Buchanan, B.. 2017. Mining Malware Command and Control Traces. 2017 Computing Conference. :788–794.

Detecting botnets and advanced persistent threats is a major challenge for network administrators. An important component of such malware is the command and control channel, which enables the malware to respond to controller commands. The detection of malware command and control channels could help prevent further malicious activity by cyber criminals using the malware. Detection of malware in network traffic is traditionally carried out by identifying specific patterns in packet payloads. Now bot writers encrypt the command and control payloads, making pattern recognition a less effective form of detection. This paper focuses instead on an effective anomaly based detection technique for bot and advanced persistent threats using a data mining approach combined with applied classification algorithms. After additional tuning, the final test on an unseen dataset, false positive rates of 0% with malware detection rates of 100% were achieved on two examined malware threats, with promising results on a number of other threats.

Baron, G.. 2017. On Sequential Selection of Attributes to Be Discretized for Authorship Attribution. 2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications (INISTA). :229–234.

Different data mining techniques are employed in stylometry domain for performing authorship attribution tasks. Sometimes to improve the decision system the discretization of input data can be applied. In many cases such approach allows to obtain better classification results. On the other hand, there were situations in which discretization decreased overall performance of the system. Therefore, the question arose what would be the result if only some selected attributes were discretized. The paper presents the results of the research performed for forward sequential selection of attributes to be discretized. The influence of such approach on the performance of the decision system, based on Naive Bayes classifier in authorship attribution domain, is presented. Some basic discretization methods and different approaches to discretization of the test datasets are taken into consideration.

2018-03-05

Mfula, H., Nurminen, J. K.. 2017. Adaptive Root Cause Analysis for Self-Healing in 5G Networks. 2017 International Conference on High Performance Computing Simulation (HPCS). :136–143.

Root cause analysis (RCA) is a common and recurring task performed by operators of cellular networks. It is done mainly to keep customers satisfied with the quality of offered services and to maximize return on investment (ROI) by minimizing and where possible eliminating the root causes of faults in cellular networks. Currently, the actual detection and diagnosis of faults or potential faults is still a manual and slow process often carried out by network experts who manually analyze and correlate various pieces of network data such as, alarms, call traces, configuration management (CM) and key performance indicator (KPI) data in order to come up with the most probable root cause of a given network fault. In this paper, we propose an automated fault detection and diagnosis solution called adaptive root cause analysis (ARCA). The solution uses measurements and other network data together with Bayesian network theory to perform automated evidence based RCA. Compared to the current common practice, our solution is faster due to automation of the entire RCA process. The solution is also cheaper because it needs fewer or no personnel in order to operate and it improves efficiency through domain knowledge reuse during adaptive learning. As it uses a probabilistic Bayesian classifier, it can work with incomplete data and it can handle large datasets with complex probability combinations. Experimental results from stratified synthesized data affirmatively validate the feasibility of using such a solution as a key part of self-healing (SH) especially in emerging self-organizing network (SON) based solutions in LTE Advanced (LTE-A) and 5G.

Yin, H. Sun, Vatrapu, R.. 2017. A First Estimation of the Proportion of Cybercriminal Entities in the Bitcoin Ecosystem Using Supervised Machine Learning. 2017 IEEE International Conference on Big Data (Big Data). :3690–3699.

Bitcoin, a peer-to-peer payment system and digital currency, is often involved in illicit activities such as scamming, ransomware attacks, illegal goods trading, and thievery. At the time of writing, the Bitcoin ecosystem has not yet been mapped and as such there is no estimate of the share of illicit activities. This paper provides the first estimation of the portion of cyber-criminal entities in the Bitcoin ecosystem. Our dataset consists of 854 observations categorised into 12 classes (out of which 5 are cybercrime-related) and a total of 100,000 uncategorised observations. The dataset was obtained from the data provider who applied three types of clustering of Bitcoin transactions to categorise entities: co-spend, intelligence-based, and behaviour-based. Thirteen supervised learning classifiers were then tested, of which four prevailed with a cross-validation accuracy of 77.38%, 76.47%, 78.46%, 80.76% respectively. From the top four classifiers, Bagging and Gradient Boosting classifiers were selected based on their weighted average and per class precision on the cybercrime-related categories. Both models were used to classify 100,000 uncategorised entities, showing that the share of cybercrime-related is 29.81% according to Bagging, and 10.95% according to Gradient Boosting with number of entities as the metric. With regard to the number of addresses and current coins held by this type of entities, the results are: 5.79% and 10.02% according to Bagging; and 3.16% and 1.45% according to Gradient Boosting.