Biblio
Emerging computing relies heavily on secure backend storage for the massive size of big data originating from the Internet of Things (IoT) smart devices to the Cloud-hosted web applications. Structured Query Language (SQL) Injection Attack (SQLIA) remains an intruder's exploit of choice to pilfer confidential data from the back-end database with damaging ramifications. The existing approaches were all before the new emerging computing in the context of the Internet big data mining and as such will lack the ability to cope with new signatures concealed in a large volume of web requests over time. Also, these existing approaches were strings lookup approaches aimed at on-premise application domain boundary, not applicable to roaming Cloud-hosted services' edge Software-Defined Network (SDN) to application endpoints with large web request hits. Using a Machine Learning (ML) approach provides scalable big data mining for SQLIA detection and prevention. Unfortunately, the absence of corpus to train a classifier is an issue well known in SQLIA research in applying Artificial Intelligence (AI) techniques. This paper presents an application context pattern-driven corpus to train a supervised learning model. The model is trained with ML algorithms of Two-Class Support Vector Machine (TC SVM) and Two-Class Logistic Regression (TC LR) implemented on Microsoft Azure Machine Learning (MAML) studio to mitigate SQLIA. This scheme presented here, then forms the subject of the empirical evaluation in Receiver Operating Characteristic (ROC) curve.
SQL injection attack (SQLIA) pose a serious security threat to the database driven web applications. This kind of attack gives attackers easily access to the application's underlying database and to the potentially sensitive information these databases contain. A hacker through specifically designed input, can access content of the database that cannot otherwise be able to do so. This is usually done by altering SQL statements that are used within web applications. Due to importance of security of web applications, researchers have studied SQLIA detection and prevention extensively and have developed various methods. In this research, after reviewing the existing research in this field, we present a new hybrid method to reduce the vulnerability of the web applications. Our method is specifically designed to detect and prevent SQLIA. Our proposed method is consists of three phases namely, the database design, implementation, and at the common gateway interface (CGI). Details of our approach along with its pros and cons are discussed in detail.
With the repaid growth of social tagging users, it becomes very important for social tagging systems how the required resources are recommended to users rapidly and accurately. Firstly, the architecture of an agent-based intelligent social tagging system is constructed using agent technology. Secondly, the design and implementation of user interest mining, personalized recommendation and common preference group recommendation are presented. Finally, a self-adaptive recommendation strategy for social tagging and its implementation are proposed based on the analysis to the shortcoming of the personalized recommendation strategy and the common preference group recommendation strategy. The self-adaptive recommendation strategy achieves equilibrium selection between efficiency and accuracy, so that it solves the contradiction between efficiency and accuracy in the personalized recommendation model and the common preference recommendation model.
The blockchain technology has emerged as an attractive solution to address performance and security issues in distributed systems. Blockchain's public and distributed peer-to-peer ledger capability benefits cloud computing services which require functions such as, assured data provenance, auditing, management of digital assets, and distributed consensus. Blockchain's underlying consensus mechanism allows to build a tamper-proof environment, where transactions on any digital assets are verified by set of authentic participants or miners. With use of strong cryptographic methods, blocks of transactions are chained together to enable immutability on the records. However, achieving consensus demands computational power from the miners in exchange of handsome reward. Therefore, greedy miners always try to exploit the system by augmenting their mining power. In this paper, we first discuss blockchain's capability in providing assured data provenance in cloud and present vulnerabilities in blockchain cloud. We model the block withholding (BWH) attack in a blockchain cloud considering distinct pool reward mechanisms. BWH attack provides rogue miner ample resources in the blockchain cloud for disrupting honest miners' mining efforts, which was verified through simulations.
Volumetric DDoS attacks continue to inflict serious damage. Many proposed defenses for mitigating such attacks assume that a monitoring system has already detected the attack. However, many proposed DDoS monitoring systems do not focus on efficiently analyzing high volume network traffic to provide important characterizations of the attack in real-time to downstream traffic filtering systems. We propose a scalable real-time framework for an effective volumetric DDoS monitoring system that leverages modern big data technologies for streaming analytics of high volume network traffic to accurately detect and characterize attacks.
In cloud storage systems, users can upload their data along with associated tags (authentication information) to cloud storage servers. To ensure the availability and integrity of the outsourced data, provable data possession (PDP) schemes convince verifiers (users or third parties) that the outsourced data stored in the cloud storage server is correct and unchanged. Recently, several PDP schemes with designated verifier (DV-PDP) were proposed to provide the flexibility of arbitrary designated verifier. A designated verifier (private verifier) is trustable and designated by a user to check the integrity of the outsourced data. However, these DV-PDP schemes are either inefficient or insecure under some circumstances. In this paper, we propose the first non-repudiable PDP scheme with designated verifier (DV-NRPDP) to address the non-repudiation issue and resolve possible disputations between users and cloud storage servers. We define the system model, framework and adversary model of DV-NRPDP schemes. Afterward, a concrete DV-NRPDP scheme is presented. Based on the computing discrete logarithm assumption, we formally prove that the proposed DV-NRPDP scheme is secure against several forgery attacks in the random oracle model. Comparisons with the previously proposed schemes are given to demonstrate the advantages of our scheme.
Classifying users according to their behaviors is a complex problem due to the high-volume of data and the unclear association between distinct data points. Although over the past years behavioral researches has mainly focused on Massive Multiplayer Online Role Playing Games (MMORPG), such as World of Warcraft (WoW), which has predefined player classes, there has been little applied to Open World Sandbox Games (OWSG). Some OWSG do not have player classes or structured linear gameplay mechanics, as freedom is given to the player to freely wander and interact with the virtual world. This research focuses on identifying different play styles that exist within the non-structured gameplay sessions of OWSG. This paper uses the OWSG TUG as a case study and over a period of forty-five days, a database stored selected gameplay events happening on the research server. The study applied k-means clustering to this dataset and evaluated the resulting distinct behavioral profiles to classify player sessions on an open world sandbox game.
Fog computing provides a new architecture for the implementation of the Internet of Things (IoT), which can connect sensor nodes to the cloud using the edge of the network. This structure has improved the latency and energy consumption in the cloud. In this heterogeneous and distributed environment, resource allocation is very important. Hence, scheduling will be a challenge to increase productivity and allocate resources appropriately to the tasks. Programs that run in this environment should be protected from intruders. We consider three parameters as authentication, integrity, and confidentiality to maintain security in fog devices. These parameters have time and computational overhead. In the proposed approach, we schedule the modules for the run in fog devices by heuristic algorithms based on data mining technique. The objective function is included CPU utilization, bandwidth, and security overhead. We compare the proposed algorithm with several heuristic algorithms. The results show that our proposed algorithm improved the average energy consumption of 63.27%, cost 44.71% relative to the PSO, ACO, SA algorithms.
The continuous advance in recent cloud-based computer networks has generated a number of security challenges associated with intrusions in network systems. With the exponential increase in the volume of network traffic data, involvement of humans in such detection systems is time consuming and a non-trivial problem. Secondly, network traffic data tends to be highly dimensional, comprising of numerous features and attributes, making classification challenging and thus susceptible to the curse of dimensionality problem. Given such scenarios, the need arises for dimensional reduction, feature selection, combined with machine-learning techniques in the classification of such data. Therefore, as a contribution, this paper seeks to employ data mining techniques in a cloud-based environment, by selecting appropriate attributes and features with the least importance in terms of weight for the classification. Often the standard is to select features with better weights while ignoring those with least weights. In this study, we seek to find out if we can make prediction using those features with least weights. The motivation is that adversaries use stealth to hide their activities from the obvious. The question then is, can we predict any stealth activity of an adversary using the least observed attributes? In this particular study, we employ information gain to select attributes with the lowest weights and then apply machine learning to classify if a combination, in this case, of both source and destination ports are attacked or not. The motivation of this investigation is if attributes that are of least importance can be used to predict if an attack could occur. Our preliminary results show that even when the source and destination port attributes are used in combination with features with the least weights, it is possible to classify such network traffic data and predict if an attack will occur or not.
Support vector machines (SVMs) have been widely used for classification in machine learning and data mining. However, SVM faces a huge challenge in large scale classification tasks. Recent progresses have enabled additive kernel version of SVM efficiently solves such large scale problems nearly as fast as a linear classifier. This paper proposes a new accelerated mini-batch stochastic gradient descent algorithm for SVM classification with additive kernel (AK-ASGD). On the one hand, the gradient is approximated by the sum of a scalar polynomial function for each feature dimension; on the other hand, Nesterov's acceleration strategy is used. The experimental results on benchmark large scale classification data sets show that our proposed algorithm can achieve higher testing accuracies and has faster convergence rate.
Detection and prevention of data breaches in corporate networks is one of the most important security problems of today's world. The techniques and applications proposed for solution are not successful when attackers attempt to steal data using steganography. Steganography is the art of storing data in a file called cover, such as picture, sound and video. The concealed data cannot be directly recognized in the cover. Steganalysis is the process of revealing the presence of embedded messages in these files. There are many statistical and signature based steganalysis algorithms. In this work, the detection of steganographic images with steganalysis techniques is reviewed and a system has been developed which automatically detects steganographic images in network traffic by using open source tools.
3D steganography is used in order to embed or hide information into 3D objects without causing visible or machine detectable modifications. In this paper we rethink about a high capacity 3D steganography based on the Hamiltonian path quantization, and increase its resistance to steganalysis. We analyze the parameters that may influence the distortion of a 3D shape as well as the resistance of the steganography to 3D steganalysis. According to the experimental results, the proposed high capacity 3D steganographic method has an increased resistance to steganalysis.
Hacker forums and other social platforms may contain vital information about cyber security threats. But using manual analysis to extract relevant threat information from these sources is a time consuming and error-prone process that requires a significant allocation of resources. In this paper, we explore the potential of Machine Learning methods to rapidly sift through hacker forums for relevant threat intelligence. Utilizing text data from a real hacker forum, we compared the text classification performance of Convolutional Neural Network methods against more traditional Machine Learning approaches. We found that traditional machine learning methods, such as Support Vector Machines, can yield high levels of performance that are on par with Convolutional Neural Network algorithms.
Hackers create different types of Malware such as Trojans which they use to steal user-confidential information (e.g. credit card details) with a few simple commands, recent malware however has been created intelligently and in an uncontrolled size, which puts malware analysis as one of the top important subjects of information security. This paper proposes an efficient dynamic malware-detection method based on API similarity. This proposed method outperform the traditional signature-based detection method. The experiment evaluated 197 malware samples and the proposed method showed promising results of correctly identified malware.
Machine learning and data mining algorithms typically assume that the training and testing data are sampled from the same fixed probability distribution; however, this violation is often violated in practice. The field of domain adaptation addresses the situation where this assumption of a fixed probability between the two domains is violated; however, the difference between the two domains (training/source and testing/target) may not be known a priori. There has been a recent thrust in addressing the problem of learning in the presence of an adversary, which we formulate as a problem of domain adaption to build a more robust classifier. This is because the overall security of classifiers and their preprocessing stages have been called into question with the recent findings of adversaries in a learning setting. Adversarial training (and testing) data pose a serious threat to scenarios where an attacker has the opportunity to ``poison'' the training or ``evade'' on the testing data set(s) in order to achieve something that is not in the best interest of the classifier. Recent work has begun to show the impact of adversarial data on several classifiers; however, the impact of the adversary on aspects related to preprocessing of data (i.e., dimensionality reduction or feature selection) has widely been ignored in the revamp of adversarial learning research. Furthermore, variable selection, which is a vital component to any data analysis, has been shown to be particularly susceptible under an attacker that has knowledge of the task. In this work, we explore avenues for learning resilient classification models in the adversarial learning setting by considering the effects of adversarial data and how to mitigate its effects through optimization. Our model forms a single convex optimization problem that uses the labeled training data from the source domain and known- weaknesses of the model for an adversarial component. We benchmark the proposed approach on synthetic data and show the trade-off between classification accuracy and skew-insensitive statistics.
There are continuous hacking and social issues regarding APT (Advanced Persistent Threat - APT) attacks and a number of antivirus businesses and researchers are making efforts to analyze such APT attacks in order to prevent or cope with APT attacks, some host PC security technologies such as firewalls and intrusion detection systems are used. Therefore, in this study, malignant behavior patterns were extracted by using an API of PE files. Moreover, the FP-Growth Algorithm to extract behavior information generated in the host PC in order to overcome the limitation of the previous signature-based intrusion detection systems. We will utilize this study as fundamental research about a system that extracts malignant behavior patterns within networks and APIs in the future.
Detecting botnets and advanced persistent threats is a major challenge for network administrators. An important component of such malware is the command and control channel, which enables the malware to respond to controller commands. The detection of malware command and control channels could help prevent further malicious activity by cyber criminals using the malware. Detection of malware in network traffic is traditionally carried out by identifying specific patterns in packet payloads. Now bot writers encrypt the command and control payloads, making pattern recognition a less effective form of detection. This paper focuses instead on an effective anomaly based detection technique for bot and advanced persistent threats using a data mining approach combined with applied classification algorithms. After additional tuning, the final test on an unseen dataset, false positive rates of 0% with malware detection rates of 100% were achieved on two examined malware threats, with promising results on a number of other threats.
Different data mining techniques are employed in stylometry domain for performing authorship attribution tasks. Sometimes to improve the decision system the discretization of input data can be applied. In many cases such approach allows to obtain better classification results. On the other hand, there were situations in which discretization decreased overall performance of the system. Therefore, the question arose what would be the result if only some selected attributes were discretized. The paper presents the results of the research performed for forward sequential selection of attributes to be discretized. The influence of such approach on the performance of the decision system, based on Naive Bayes classifier in authorship attribution domain, is presented. Some basic discretization methods and different approaches to discretization of the test datasets are taken into consideration.
In recent cyber incidents, Ransom software (ransomware) causes a major threat to the security of computer systems. Consequently, ransomware detection has become a hot topic in computer security. Unfortunately, current signature-based and static detection model is often easily evadable by obfuscation, polymorphism, compress, and encryption. For overcoming the lack of signature-based and static ransomware detection approach, we have proposed the dynamic ransomware detection system using data mining techniques such as Random Forest (RF), Support Vector Machine (SVM), Simple Logistic (SL) and Naive Bayes (NB) algorithms for detecting known and unknown ransomware. We monitor the actual (dynamic) behaviors of software to generate API calls flow graphs (CFG) and transfer it in a feature space. Thereafter, data normalization and feature selection were applied to select informative features which are the best for discriminating between various categories of software and benign software. Finally, the data mining algorithms were used for building the detection model for judging whether the software is benign software or ransomware. Our experimental results show that our proposed system can be more effective to improve the performance for ransomware detection. Especially, the accuracy and detection rate of our proposed system with Simple Logistic (SL) algorithm can achieve to 98.2% and 97.6%, respectively. Meanwhile, the false positive rate also can be reduced to 1.2%.
Ransomware, a class of self-propagating malware that uses encryption to hold the victims' data ransom, has emerged in recent years as one of the most dangerous cyber threats, with widespread damage; e.g., zero-day ransomware WannaCry has caused world-wide catastrophe, from knocking U.K. National Health Service hospitals offline to shutting down a Honda Motor Company in Japan [1]. Our close collaboration with security operations of large enterprises reveals that defense against ransomware relies on tedious analysis from high-volume systems logs of the first few infections. Sandbox analysis of freshly captured malware is also commonplace in operation. We introduce a method to identify and rank the most discriminating ransomware features from a set of ambient (non-attack) system logs and at least one log stream containing both ambient and ransomware behavior. These ranked features reveal a set of malware actions that are produced automatically from system logs, and can help automate tedious manual analysis. We test our approach using WannaCry and two polymorphic samples by producing logs with Cuckoo Sandbox during both ambient, and ambient plus ransomware executions. Our goal is to extract the features of the malware from the logs with only knowledge that malware was present. We compare outputs with a detailed analysis of WannaCry allowing validation of the algorithm's feature extraction and provide analysis of the method's robustness to variations of input data—changing quality/quantity of ambient data and testing polymorphic ransomware. Most notably, our patterns are accurate and unwavering when generated from polymorphic WannaCry copies, on which 63 (of 63 tested) antivirus (AV) products fail.
As the most successful cryptocurrency to date, Bitcoin constitutes a target of choice for attackers. While many attack vectors have already been uncovered, one important vector has been left out though: attacking the currency via the Internet routing infrastructure itself. Indeed, by manipulating routing advertisements (BGP hijacks) or by naturally intercepting traffic, Autonomous Systems (ASes) can intercept and manipulate a large fraction of Bitcoin traffic. This paper presents the first taxonomy of routing attacks and their impact on Bitcoin, considering both small-scale attacks, targeting individual nodes, and large-scale attacks, targeting the network as a whole. While challenging, we show that two key properties make routing attacks practical: (i) the efficiency of routing manipulation; and (ii) the significant centralization of Bitcoin in terms of mining and routing. Specifically, we find that any network attacker can hijack few (\textbackslashtextless;100) BGP prefixes to isolate 50% of the mining power-even when considering that mining pools are heavily multi-homed. We also show that on-path network attackers can considerably slow down block propagation by interfering with few key Bitcoin messages. We demonstrate the feasibility of each attack against the deployed Bitcoin software. We also quantify their effectiveness on the current Bitcoin topology using data collected from a Bitcoin supernode combined with BGP routing data. The potential damage to Bitcoin is worrying. By isolating parts of the network or delaying block propagation, attackers can cause a significant amount of mining power to be wasted, leading to revenue losses and enabling a wide range of exploits such as double spending. To prevent such effects in practice, we provide both short and long-term countermeasures, some of which can be deployed immediately.
Bitcoin, one major virtual currency, attracts users' attention by its novel mode in recent years. With blockchain as its basic technique, Bitcoin possesses strong security features which anonymizes user's identity to protect their private information. However, some criminals utilize Bitcoin to do several illegal activities bringing in great security threat to the society. Therefore, it is necessary to get knowledge of the current trend of Bitcoin and make effort to de-anonymize. In this paper, we put forward and realize a system to analyze Bitcoin from two aspects: blockchain data and network traffic data. We resolve the blockchain data to analyze Bitcoin from the point of Bitcoin address while simulate Bitcoin P2P protocol to evaluate Bitcoin from the point of IP address. At last, with our system, we finish analyzing its current trends and tracing its transactions by putting some statistics on Bitcoin transactions and addresses, tracing the transaction flow and de-anonymizing some Bitcoin addresses to IPs.
In recent years, the usage of unmanned aircraft systems (UAS) for security-related purposes has increased, ranging from military applications to different areas of civil protection. The deployment of UAS can support security forces in achieving an enhanced situational awareness. However, in order to provide useful input to a situational picture, sensor data provided by UAS has to be integrated with information about the area and objects of interest from other sources. The aim of this study is to design a high-level data fusion component combining probabilistic information processing with logical and probabilistic reasoning, to support human operators in their situational awareness and improving their capabilities for making efficient and effective decisions. To this end, a fusion component based on the ISR (Intelligence, Surveillance and Reconnaissance) Analytics Architecture (ISR-AA) [1] is presented, incorporating an object-oriented world model (OOWM) for information integration, an expressive knowledge model and a reasoning component for detection of critical events. Approaches for translating the information contained in the OOWM into either an ontology for logical reasoning or a Markov logic network for probabilistic reasoning are presented.
The ability to discover patterns of interest in criminal networks can support and ease the investigation tasks by security and law enforcement agencies. By considering criminal networks as a special case of social networks, we can properly reuse most of the state-of-the-art techniques to discover patterns of interests, i.e., hidden and potential links. Nevertheless, in time-sensible scenarios, like the one involving criminal actions, the ability to discover patterns in a (near) real-time manner can be of primary importance.In this paper, we investigate the identification of patterns for link detection and prediction on an evolving criminal network. To extract valuable information as soon as data is generated, we exploit a stream processing approach. To this end, we also propose three new similarity social network metrics, specifically tailored for criminal link detection and prediction. Then, we develop a flexible data stream processing application relying on the Apache Flink framework; this solution allows us to deploy and evaluate the newly proposed metrics as well as the ones existing in literature. The experimental results show that the new metrics we propose can reach up to 83% accuracy in detection and 82% accuracy in prediction, resulting competitive with the state of the art metrics.
Technological advancement enables the need of internet everywhere. The power industry is not an exception in the technological advancement which makes everything smarter. Smart grid is the advanced version of the traditional grid, which makes the system more efficient and self-healing. Synchrophasor is a device used in smart grids to measure the values of electric waves, voltages and current. The phasor measurement unit produces immense volume of current and voltage data that is used to monitor and control the performance of the grid. These data are huge in size and vulnerable to attacks. Intrusion Detection is a common technique for finding the intrusions in the system. In this paper, a big data framework is designed using various machine learning techniques, and intrusions are detected based on the classifications applied on the synchrophasor dataset. In this approach various machine learning techniques like deep neural networks, support vector machines, random forest, decision trees and naive bayes classifications are done for the synchrophasor dataset and the results are compared using metrics of accuracy, recall, false rate, specificity, and prediction time. Feature selection and dimensionality reduction algorithms are used to reduce the prediction time taken by the proposed approach. This paper uses apache spark as a platform which is suitable for the implementation of Intrusion Detection system in smart grids using big data analytics.