Biblio
The increasing complexity and ubiquity in user connectivity, computing environments, information content, and software, mobile, and web applications transfers the responsibility of privacy management to the individuals. Hence, making it extremely difficult for users to maintain the intelligent and targeted level of privacy protection that they need and desire, while simultaneously maintaining their ability to optimally function. Thus, there is a critical need to develop intelligent, automated, and adaptable privacy management systems that can assist users in managing and protecting their sensitive data in the increasingly complex situations and environments that they find themselves in. This work is a first step in exploring the development of such a system, specifically how user personality traits and other characteristics can be used to help automate determination of user sharing preferences for a variety of user data and situations. The Big-Five personality traits of openness, conscientiousness, extroversion, agreeableness, and neuroticism are examined and used as inputs into several popular machine learning algorithms in order to assess their ability to elicit and predict user privacy preferences. Our results show that the Big-Five personality traits can be used to significantly improve the prediction of user privacy preferences in a number of contexts and situations, and so using machine learning approaches to automate the setting of user privacy preferences has the potential to greatly reduce the burden on users while simultaneously improving the accuracy of their privacy preferences and security.
Machine learning has been widely used and achieved considerable results in various research areas. On the other hand, machine learning becomes a big threat when malicious attackers make use it for the wrong purpose. As such a threat, self-evolving botnets have been considered in the past. The self-evolving botnets autonomously predict vulnerabilities by implementing machine learning with computing resources of zombie computers. Furthermore, they evolve based on the vulnerability, and thus have high infectivity. In this paper, we consider several models of Markov chains to counter the spreading of the self-evolving botnets. Through simulation experiments, this paper shows the behaviors of these models.
Nowadays data is always stored in a computer in the hyper-connected world and, a company or an organization or a person can come across financial loss, reputation loss, business disruption and intellectual property loss because of data leakage or data disclosure. Remote Access Trojans are used to invade a victim's PC and collect information from it. There have been signatures for these that have already emerged and defined as malwares, but there is no available signature yet if a malware or a remote access Trojan is a zero-day threat. In this circumstance network behavioral analysis is more useful than signature-based anti-virus scanners in order to detect the different behavior of malware. When the traffic will be cut or stoppedis important in capturing network traffic. In this paper, effective features for detecting RATs are proposed. These features are extracted from the first twenty packets. Our approach achieves 98% accuracy and 10% false negative rate by random forest algorithm.
In the paper, we demonstrate novel approach for network Intrusion Detection System (IDS) for cyber security using unsupervised Deep Learning (DL) techniques. Very often, the supervised learning and rules based approach like SNORT fetch problem to identify new type of attacks. In this implementation, the input samples are numerical encoded and applied un-supervised deep learning techniques called Auto Encoder (AE) and Restricted Boltzmann Machine (RBM) for feature extraction and dimensionality reduction. Then iterative k-means clustering is applied for clustering on lower dimension space with only 3 features. In addition, Unsupervised Extreme Learning Machine (UELM) is used for network intrusion detection in this implementation. We have experimented on KDD-99 dataset, the experimental results show around 91.86% and 92.12% detection accuracy using unsupervised deep learning technique AE and RBM with K-means respectively. The experimental results also demonstrate, the proposed approach shows around 4.4% and 2.95% improvement of detection accuracy using RBM with K-means against only K-mean clustering and Unsupervised Extreme Learning Machine (USELM) respectively.
Currently, mobile botnet attacks have shifted from computers to smartphones due to its functionality, ease to exploit, and based on financial intention. Mostly, it attacks Android due to its popularity and high usage among end users. Every day, more and more malicious mobile applications (apps) with the botnet capability have been developed to exploit end users' smartphones. Therefore, this paper presents a new mobile botnet classification based on permission and Application Programming Interface (API) calls in the smartphone. This classification is developed using static analysis in a controlled lab environment and the Drebin dataset is used as the training dataset. 800 apps from the Google Play Store have been chosen randomly to test the proposed classification. As a result, 16 permissions and 31 API calls that are most related with mobile botnet have been extracted using feature selection and later classified and tested using machine learning algorithms. The experimental result shows that the Random Forest Algorithm has achieved the highest detection accuracy of 99.4% with the lowest false positive rate of 16.1% as compared to other machine learning algorithms. This new classification can be used as the input for mobile botnet detection for future work, especially for financial matters.
In this paper, we propose a method for normalization of rich feature sets to improve detection accuracy of simple classifiers in steganalysis. It consists of two steps: 1) replacing random subsets of empirical joint probability mass functions (co-occurrences) by their conditional probabilities and 2) applying a non-linear normalization to each element of the feature vector by forcing its marginal distribution over covers to be uniform. We call the first step random conditioning and the second step feature uniformization. When applied to maxSRMd2 features in combination with simple classifiers, we observe a gain in detection accuracy across all tested stego algorithms and payloads. For better insight, we investigate the gain for two image formats. The proposed normalization has a very low computational complexity and does not require any feedback from the stego class.
Although connecting a microgrid to modern power systems can alleviate issues arising from a large penetration of distributed generation, it can also cause severe voltage instability problems. This paper presents an online method to analyze voltage security in a microgrid using convolutional neural networks. To transform the traditional voltage stability problem into a classification problem, three steps are considered: 1) creating data sets using offline simulation results; 2) training the model with dimensional reduction and convolutional neural networks; 3) testing the online data set and evaluating performance. A case study in the modified IEEE 14-bus system shows the accuracy of the proposed analysis method increases by 6% compared to back-propagation neural network and has better performance than decision tree and support vector machine. The proposed algorithm has great potential in future applications.
Given social media users' plethora of interactions, appropriately controlling access to such information becomes a challenging task for users. Selecting the appropriate audience, even from within their own friend network, can be fraught with difficulties. PACMAN is a potential solution for this dilemma problem. It's a personal assistant agent that recommends personalized access control decisions based on the social context of any information disclosure by incorporating communities generated from the user's network structure and utilizing information in the user's profile. PACMAN provides accurate recommendations while minimizing intrusiveness.
Malicious applications have become increasingly numerous. This demands adaptive, learning-based techniques for constructing malware detection engines, instead of the traditional manual-based strategies. Prior work in learning-based malware detection engines primarily focuses on dynamic trace analysis and byte-level n-grams. Our approach in this paper differs in that we use compiler intermediate representations, i.e., the callgraph representation of binaries. Using graph-based program representations for learning provides structure of the program, which can be used to learn more advanced patterns. We use the Shortest Path Graph Kernel (SPGK) to identify similarities between call graphs extracted from binaries. The output similarity matrix is fed into a Support Vector Machine (SVM) algorithm to construct highly-accurate models to predict whether a binary is malicious or not. However, SPGK is computationally expensive due to the size of the input graphs. Therefore, we evaluate different parallelization methods for CPUs and GPUs to speed up this kernel, allowing us to continuously construct up-to-date models in a timely manner. Our hybrid implementation, which leverages both CPU and GPU, yields the best performance, achieving up to a 14.2x improvement over our already optimized OpenMP version. We compared our generated graph-based models to previously state-of-the-art feature vector 2-gram and 3-gram models on a dataset consisting of over 22,000 binaries. We show that our classification accuracy using graphs is over 19% higher than either n-gram model and gives a false positive rate (FPR) of less than 0.1%. We are also able to consider large call graphs and dataset sizes because of the reduced execution time of our parallelized SPGK implementation.
In the last decade, numerous fake websites have been developed on the World Wide Web to mimic trusted websites, with the aim of stealing financial assets from users and organizations. This form of online attack is called phishing, and it has cost the online community and the various stakeholders hundreds of million Dollars. Therefore, effective counter measures that can accurately detect phishing are needed. Machine learning (ML) is a popular tool for data analysis and recently has shown promising results in combating phishing when contrasted with classic anti-phishing approaches, including awareness workshops, visualization and legal solutions. This article investigates ML techniques applicability to detect phishing attacks and describes their pros and cons. In particular, different types of ML techniques have been investigated to reveal the suitable options that can serve as anti-phishing tools. More importantly, we experimentally compare large numbers of ML techniques on real phishing datasets and with respect to different metrics. The purpose of the comparison is to reveal the advantages and disadvantages of ML predictive models and to show their actual performance when it comes to phishing attacks. The experimental results show that Covering approach models are more appropriate as anti-phishing solutions, especially for novice users, because of their simple yet effective knowledge bases in addition to their good phishing detection rate.
Machine learning (ML) models, e.g., deep neural networks (DNNs), are vulnerable to adversarial examples: malicious inputs modified to yield erroneous model outputs, while appearing unmodified to human observers. Potential attacks include having malicious content like malware identified as legitimate or controlling vehicle behavior. Yet, all existing adversarial example attacks require knowledge of either the model internals or its training data. We introduce the first practical demonstration of an attacker controlling a remotely hosted DNN with no such knowledge. Indeed, the only capability of our black-box adversary is to observe labels given by the DNN to chosen inputs. Our attack strategy consists in training a local model to substitute for the target DNN, using inputs synthetically generated by an adversary and labeled by the target DNN. We use the local substitute to craft adversarial examples, and find that they are misclassified by the targeted DNN. To perform a real-world and properly-blinded evaluation, we attack a DNN hosted by MetaMind, an online deep learning API. We find that their DNN misclassifies 84.24% of the adversarial examples crafted with our substitute. We demonstrate the general applicability of our strategy to many ML techniques by conducting the same attack against models hosted by Amazon and Google, using logistic regression substitutes. They yield adversarial examples misclassified by Amazon and Google at rates of 96.19% and 88.94%. We also find that this black-box attack strategy is capable of evading defense strategies previously found to make adversarial example crafting harder.
Machine learning algorithms based on deep neural networks (NN) have achieved remarkable results and are being extensively used in different domains. On the other hand, with increasing growth of cloud services, several Machine Learning as a Service (MLaaS) are offered where training and deploying machine learning models are performed on cloud providers' infrastructure. However, machine learning algorithms require access to raw data which is often privacy sensitive and can create potential security and privacy risks. To address this issue, we develop new techniques to provide solutions for applying deep neural network algorithms to the encrypted data. In this paper, we show that it is feasible and practical to train neural networks using encrypted data and to make encrypted predictions, and also return the predictions in an encrypted form. We demonstrate applicability of the proposed techniques and evaluate its performance. The empirical results show that it provides accurate privacy-preserving training and classification.
Ransomware is one of the most increasing malwares used by cyber-criminals in recent days. This type of malware uses cryptographic technology that encrypts a user's important files, folders makes the computer systems unusable, holds the decryption key and asks for the ransom from the victims for recovery. The recent ransomware families are very sophisticated and difficult to analyze & detect using static features only. On the other hand, latest crypto-ransomwares having sandboxing and IDS evading capabilities. So obviously, static or dynamic analysis of the ransomware alone cannot provide better solution. In this paper, we will present a Machine Learning based approach which will use integrated method, a combination of static and dynamic analysis to detect ransomware. The experimental test samples were taken from almost all ransomware families including the most recent ``WannaCry''. The results also suggest that combined analysis can detect ransomware with better accuracy compared to individual analysis approach. Since ransomware samples show some ``run-time'' and ``static code'' features, it also helps for the early detection of new and similar ransomware variants.
To improve the resilience of state estimation strategy against cyber attacks, the Compressive Sensing (CS) is applied in reconstruction of incomplete measurements for cyber physical systems. First, observability analysis is used to decide the time to run the reconstruction and the damage level from attacks. In particular, the dictionary learning is proposed to form the over-completed dictionary by K-Singular Value Decomposition (K-SVD). Besides, due to the irregularity of incomplete measurements, sampling matrix is designed as the measurement matrix. Finally, the simulation experiments on 6-bus power system illustrate that the proposed method achieves the incomplete measurements reconstruction perfectly, which is better than the joint dictionary. When only 29% available measurements are left, the proposed method has generality for four kinds of recovery algorithms.
Generative policies enable devices to generate their own policies that are validated, consistent and conflict free. This autonomy is required for security policy generation to deal with the large number of smart devices per person that will soon become reality. In this paper, we discuss the research issues that have to be addressed in order for devices involved in security enforcement to automatically generate their security policies - enabling policy-based autonomous security management. We discuss the challenges involved in the task of automatic security policy generation, and outline some approaches based om machine learning that may potentially provide a solution to the same.
Many enterprises are transitioning towards data-driven business processes. There are numerous situations where multiple parties would like to share data towards a common goal if it were possible to simultaneously protect the privacy and security of the individuals and organizations described in the data. Existing solutions for multi-party analytics that follow the so called Data Lake paradigm have parties transfer their raw data to a trusted third-party (i.e., mediator), which then performs the desired analysis on the global data, and shares the results with the parties. However, such a solution does not fit many applications such as Healthcare, Finance, and the Internet-of-Things, where privacy is a strong concern. Motivated by the increasing demands for data privacy, we study the problem of privacy-preserving multi-party data analytics, where the goal is to enable analytics on multi-party data without compromising the data privacy of each individual party. In this paper, we first propose a secure sum protocol with strong security guarantees. The proposed secure sum protocol is resistant to collusion attacks even with N-2 parties colluding, where N denotes the total number of collaborating parties. We then use this protocol to propose two secure gradient descent algorithms, one for horizontally partitioned data, and the other for vertically partitioned data. The proposed framework is generic and applies to a wide class of machine learning problems. We demonstrate our solution for two popular use-cases, regression and classification, and evaluate the performance of the proposed solution in terms of the obtained model accuracy, latency and communication cost. In addition, we perform a scalability analysis to evaluate the performance of the proposed solution as the data size and the number of parties increase.
With smart phones being indispensable in people's everyday life, Android malware has posed serious threats to their security, making its detection of utmost concern. To protect legitimate users from the evolving Android malware attacks, machine learning-based systems have been successfully deployed and offer unparalleled flexibility in automatic Android malware detection. In these systems, based on different feature representations, various kinds of classifiers are constructed to detect Android malware. Unfortunately, as classifiers become more widely deployed, the incentive for defeating them increases. In this paper, we explore the security of machine learning in Android malware detection on the basis of a learning-based classifier with the input of a set of features extracted from the Android applications (apps). We consider different importances of the features associated with their contributions to the classification problem as well as their manipulation costs, and present a novel feature selection method (named SecCLS) to make the classifier harder to be evaded. To improve the system security while not compromising the detection accuracy, we further propose an ensemble learning approach (named SecENS) by aggregating the individual classifiers that are constructed using our proposed feature selection method SecCLS. Accordingly, we develop a system called SecureDroid which integrates our proposed methods (i.e., SecCLS and SecENS) to enhance security of machine learning-based Android malware detection. Comprehensive experiments on the real sample collections from Comodo Cloud Security Center are conducted to validate the effectiveness of SecureDroid against adversarial Android malware attacks by comparisons with other alternative defense methods. Our proposed secure-learning paradigm can also be readily applied to other malware detection tasks.
Cloud Computing has many significant benefits like the provision of computing resources and virtual networks on demand. However, there is the problem to assure the security of these networks against Distributed Denial-of-Service (DDoS) attack. Over the past few decades, the development of protection method based on data mining has attracted many researchers because of its effectiveness and practical significance. Most commonly these detection methods use prelearned models or models based on rules. Because of this the proposed DDoS detection methods often failure in dynamically changing cloud virtual networks. In this paper, we purposed self-learning method allows to adapt a detection model to network changes. This is minimized the false detection and reduce the possibility to mark legitimate users as malicious and vice versa. The developed method consists of two steps: collecting data about the network traffic by Netflow protocol and relearning the detection model with the new data. During the data collection we separate the traffic on legitimate and malicious. The separated traffic is labeled and sent to the relearning pool. The detection model is relearned by a data from the pool of current traffic. The experiment results show that proposed method could increase efficiency of DDoS detection systems is using data mining.
This paper is based on the previous research that selects the proper surrogate nodes for fast recovery mechanism in industrial IoT (Internet of Things) Environment which uses a variety of sensors to collect the data and exchange the collected data in real-time for creating added value. We are going to suggest the way that how to decide the number of surrogate node automatically in different deployed industrial IoT Environment so that minimize the system recovery time when the central server likes IoT gateway is in failure. We are going to use the network simulator to measure the recovery time depending on the number of the selected surrogate nodes according to the sub-devices which are connected to the IoT gateway.
Recently, due to the increase of outsourcing in IC design, it has been reported that malicious third-party vendors often insert hardware Trojans into their ICs. How to detect them is a strong concern in IC design process. The features of hardware-Trojan infected nets (or Trojan nets) in ICs often differ from those of normal nets. To classify all the nets in netlists designed by third-party vendors into Trojan ones and normal ones, we have to extract effective Trojan features from Trojan nets. In this paper, we first propose 51 Trojan features which describe Trojan nets from netlists. Based on the importance values obtained from the random forest classifier, we extract the best set of 11 Trojan features out of the 51 features which can effectively detect Trojan nets, maximizing the F-measures. By using the 11 Trojan features extracted, the machine-learning based hardware Trojan classifier has achieved at most 100% true positive rate as well as 100% true negative rate in several TrustHUB benchmarks and obtained the average F-measure of 74.6%, which realizes the best values among existing machine-learning-based hardware-Trojan detection methods.
A Distributed Denial of Service (DDoS) attack is an attempt to make an online service, a network, or even an entire organization, unavailable by saturating it with traffic from multiple sources. DDoS attacks are among the most common and most devastating threats that network defenders have to watch out for. DDoS attacks are becoming bigger, more frequent, and more sophisticated. Volumetric attacks are the most common types of DDoS attacks. A DDoS attack is considered volumetric, or high-rate, when within a short period of time it generates a large amount of packets or a high volume of traffic. High-rate attacks are well-known and have received much attention in the past decade; however, despite several detection and mitigation strategies have been designed and implemented, high-rate attacks are still halting the normal operation of information technology infrastructures across the Internet when the protection mechanisms are not able to cope with the aggregated capacity that the perpetrators have put together. With this in mind, the present paper aims to propose and test a distributed and collaborative architecture for online high-rate DDoS attack detection and mitigation based on an in-memory distributed graph data structure and unsupervised machine learning algorithms that leverage real-time streaming data and analytics. We have successfully tested our proposed mechanism using a real-world DDoS attack dataset at its original rate in pursuance of reproducing the conditions of an actual large scale attack.
In this talk, I will discuss my lab's work in the emerging field of adversarial stylometry and machine learning. Machine learning algorithms are increasingly being used in security and privacy domains, in areas that go beyond intrusion or spam detection. For example, in digital forensics, questions often arise about the authors of documents: their identity, demographic background, and whether they can be linked to other documents. The field of stylometry uses linguistic features and machine learning techniques to answer these questions. We have applied stylometry to difficult domains such as underground hacker forums, open source projects (code), and tweets. I will discuss our Doppelgnger Finder algorithm, which enables us to group Sybil accounts on underground forums and detect blogs from Twitter feeds and reddit comments. In addition, I will discuss our work attributing unknown source code and binaries.
When a person gets to a door and wants to get in, what do they do? They knock. In our system, the user's specific knock pattern authenticates their identity, and opens the door for them. The system empowers people's intuitive actions and responses to affect the world around them in a new way. We leverage IOT, and physical computing to make more technology feel like less. From there, the system of a knock based entrance creates affordances in social interaction for shared spaces wherein ownership fluidity and accessibility needs to be balanced with security
Bitcoin, a peer-to-peer payment system and digital currency, is often involved in illicit activities such as scamming, ransomware attacks, illegal goods trading, and thievery. At the time of writing, the Bitcoin ecosystem has not yet been mapped and as such there is no estimate of the share of illicit activities. This paper provides the first estimation of the portion of cyber-criminal entities in the Bitcoin ecosystem. Our dataset consists of 854 observations categorised into 12 classes (out of which 5 are cybercrime-related) and a total of 100,000 uncategorised observations. The dataset was obtained from the data provider who applied three types of clustering of Bitcoin transactions to categorise entities: co-spend, intelligence-based, and behaviour-based. Thirteen supervised learning classifiers were then tested, of which four prevailed with a cross-validation accuracy of 77.38%, 76.47%, 78.46%, 80.76% respectively. From the top four classifiers, Bagging and Gradient Boosting classifiers were selected based on their weighted average and per class precision on the cybercrime-related categories. Both models were used to classify 100,000 uncategorised entities, showing that the share of cybercrime-related is 29.81% according to Bagging, and 10.95% according to Gradient Boosting with number of entities as the metric. With regard to the number of addresses and current coins held by this type of entities, the results are: 5.79% and 10.02% according to Bagging; and 3.16% and 1.45% according to Gradient Boosting.
Supervisory Control and Data Acquisition (SCADA) systems complexity and interconnectivity increase in recent years have exposed the SCADA networks to numerous potential vulnerabilities. Several studies have shown that anomaly-based Intrusion Detection Systems (IDS) achieves improved performance to identify unknown or zero-day attacks. In this paper, we propose a hybrid model for anomaly-based intrusion detection in SCADA networks using machine learning approach. In the first part, we present a robust hybrid model for anomaly-based intrusion detection in SCADA networks. Finally, we present a feature selection model for anomaly-based intrusion detection in SCADA networks by removing redundant and irrelevant features. Irrelevant features in the dataset can affect modeling power and reduce predictive accuracy. These models were evaluated using an industrial control system dataset developed at the Distributed Analytics and Security Institute Mississippi State University Starkville, MS, USA. The experimental results show that our proposed model has a key effect in reducing the time and computational complexity and achieved improved accuracy and detection rate. The accuracy of our proposed model was measured as 99.5 % for specific-attack-labeled.