Biblio
Critical information systems strongly rely on event logging techniques to collect data, such as housekeeping/error events, execution traces and dumps of variables, into unstructured text logs. Event logs are the primary source to gain actionable intelligence from production systems. In spite of the recognized importance, system/application logs remain quite underutilized in security analytics when compared to conventional and structured data sources, such as audit traces, network flows and intrusion detection logs. This paper proposes a method to measure the occurrence of interesting activity (i.e., entries that should be followed up by analysts) within textual and heterogeneous runtime log streams. We use an entropy-based approach, which makes no assumptions on the structure of underlying log entries. Measurements have been done in a real-world Air Traffic Control information system through a data analytics framework. Experiments suggest that our entropy-based method represents a valuable complement to security analytics solutions.
Hacker forums and other social platforms may contain vital information about cyber security threats. But using manual analysis to extract relevant threat information from these sources is a time consuming and error-prone process that requires a significant allocation of resources. In this paper, we explore the potential of Machine Learning methods to rapidly sift through hacker forums for relevant threat intelligence. Utilizing text data from a real hacker forum, we compared the text classification performance of Convolutional Neural Network methods against more traditional Machine Learning approaches. We found that traditional machine learning methods, such as Support Vector Machines, can yield high levels of performance that are on par with Convolutional Neural Network algorithms.
The following topics are dealt with: feature extraction; data mining; support vector machines; mobile computing; photovoltaic power systems; mean square error methods; fault diagnosis; natural language processing; control system synthesis; and Internet of Things.
The continuous advance in recent cloud-based computer networks has generated a number of security challenges associated with intrusions in network systems. With the exponential increase in the volume of network traffic data, involvement of humans in such detection systems is time consuming and a non-trivial problem. Secondly, network traffic data tends to be highly dimensional, comprising of numerous features and attributes, making classification challenging and thus susceptible to the curse of dimensionality problem. Given such scenarios, the need arises for dimensional reduction, feature selection, combined with machine-learning techniques in the classification of such data. Therefore, as a contribution, this paper seeks to employ data mining techniques in a cloud-based environment, by selecting appropriate attributes and features with the least importance in terms of weight for the classification. Often the standard is to select features with better weights while ignoring those with least weights. In this study, we seek to find out if we can make prediction using those features with least weights. The motivation is that adversaries use stealth to hide their activities from the obvious. The question then is, can we predict any stealth activity of an adversary using the least observed attributes? In this particular study, we employ information gain to select attributes with the lowest weights and then apply machine learning to classify if a combination, in this case, of both source and destination ports are attacked or not. The motivation of this investigation is if attributes that are of least importance can be used to predict if an attack could occur. Our preliminary results show that even when the source and destination port attributes are used in combination with features with the least weights, it is possible to classify such network traffic data and predict if an attack will occur or not.
As the most successful cryptocurrency to date, Bitcoin constitutes a target of choice for attackers. While many attack vectors have already been uncovered, one important vector has been left out though: attacking the currency via the Internet routing infrastructure itself. Indeed, by manipulating routing advertisements (BGP hijacks) or by naturally intercepting traffic, Autonomous Systems (ASes) can intercept and manipulate a large fraction of Bitcoin traffic. This paper presents the first taxonomy of routing attacks and their impact on Bitcoin, considering both small-scale attacks, targeting individual nodes, and large-scale attacks, targeting the network as a whole. While challenging, we show that two key properties make routing attacks practical: (i) the efficiency of routing manipulation; and (ii) the significant centralization of Bitcoin in terms of mining and routing. Specifically, we find that any network attacker can hijack few (\textbackslashtextless;100) BGP prefixes to isolate 50% of the mining power-even when considering that mining pools are heavily multi-homed. We also show that on-path network attackers can considerably slow down block propagation by interfering with few key Bitcoin messages. We demonstrate the feasibility of each attack against the deployed Bitcoin software. We also quantify their effectiveness on the current Bitcoin topology using data collected from a Bitcoin supernode combined with BGP routing data. The potential damage to Bitcoin is worrying. By isolating parts of the network or delaying block propagation, attackers can cause a significant amount of mining power to be wasted, leading to revenue losses and enabling a wide range of exploits such as double spending. To prevent such effects in practice, we provide both short and long-term countermeasures, some of which can be deployed immediately.
SQL injection attack (SQLIA) pose a serious security threat to the database driven web applications. This kind of attack gives attackers easily access to the application's underlying database and to the potentially sensitive information these databases contain. A hacker through specifically designed input, can access content of the database that cannot otherwise be able to do so. This is usually done by altering SQL statements that are used within web applications. Due to importance of security of web applications, researchers have studied SQLIA detection and prevention extensively and have developed various methods. In this research, after reviewing the existing research in this field, we present a new hybrid method to reduce the vulnerability of the web applications. Our method is specifically designed to detect and prevent SQLIA. Our proposed method is consists of three phases namely, the database design, implementation, and at the common gateway interface (CGI). Details of our approach along with its pros and cons are discussed in detail.
Recent studies have shown that adding explicit social trust information to social recommendation significantly improves the prediction accuracy of ratings, but it is difficult to obtain a clear trust data among users in real life. Scholars have studied and proposed some trust measure methods to calculate and predict the interaction and trust between users. In this article, a method of social trust relationship extraction based on hellinger distance is proposed, and user similarity is calculated by describing the f-divergence of one side node in user-item bipartite networks. Then, a new matrix factorization model based on implicit social relationship is proposed by adding the extracted implicit social relations into the improved matrix factorization. The experimental results support that the effect of using implicit social trust to recommend is almost the same as that of using actual explicit user trust ratings, and when the explicit trust data cannot be extracted, our method has a better effect than the other traditional algorithms.
In the present time, there has been a huge increase in large data repositories by corporations, governments, and healthcare organizations. These repositories provide opportunities to design/improve decision-making systems by mining trends and patterns from the data set (that can provide credible information) to improve customer service (e.g., in healthcare). As a result, while data sharing is essential, it is an obligation to maintaining the privacy of the data donors as data custodians have legal and ethical responsibilities to secure confidentiality. This research proposes a 2-layer privacy preserving (2-LPP) data sanitization algorithm that satisfies ε-differential privacy for publishing sanitized data. The proposed algorithm also reduces the re-identification risk of the sanitized data. The proposed algorithm has been implemented, and tested with two different data sets. Compared to other existing works, the results obtained from the proposed algorithm show promising performance.
This paper describes the various malware datasets that we have obtained permissions to host at the University of Arizona as part of a National Science Foundation funded project. It also describes some other malware datasets that we are in the process of obtaining permissions to host at the University of Arizona. We have also discussed some preliminary work we have carried out on malware analysis using big data platforms.
Bitcoin, one major virtual currency, attracts users' attention by its novel mode in recent years. With blockchain as its basic technique, Bitcoin possesses strong security features which anonymizes user's identity to protect their private information. However, some criminals utilize Bitcoin to do several illegal activities bringing in great security threat to the society. Therefore, it is necessary to get knowledge of the current trend of Bitcoin and make effort to de-anonymize. In this paper, we put forward and realize a system to analyze Bitcoin from two aspects: blockchain data and network traffic data. We resolve the blockchain data to analyze Bitcoin from the point of Bitcoin address while simulate Bitcoin P2P protocol to evaluate Bitcoin from the point of IP address. At last, with our system, we finish analyzing its current trends and tracing its transactions by putting some statistics on Bitcoin transactions and addresses, tracing the transaction flow and de-anonymizing some Bitcoin addresses to IPs.
Detecting botnets and advanced persistent threats is a major challenge for network administrators. An important component of such malware is the command and control channel, which enables the malware to respond to controller commands. The detection of malware command and control channels could help prevent further malicious activity by cyber criminals using the malware. Detection of malware in network traffic is traditionally carried out by identifying specific patterns in packet payloads. Now bot writers encrypt the command and control payloads, making pattern recognition a less effective form of detection. This paper focuses instead on an effective anomaly based detection technique for bot and advanced persistent threats using a data mining approach combined with applied classification algorithms. After additional tuning, the final test on an unseen dataset, false positive rates of 0% with malware detection rates of 100% were achieved on two examined malware threats, with promising results on a number of other threats.
In cloud storage systems, users can upload their data along with associated tags (authentication information) to cloud storage servers. To ensure the availability and integrity of the outsourced data, provable data possession (PDP) schemes convince verifiers (users or third parties) that the outsourced data stored in the cloud storage server is correct and unchanged. Recently, several PDP schemes with designated verifier (DV-PDP) were proposed to provide the flexibility of arbitrary designated verifier. A designated verifier (private verifier) is trustable and designated by a user to check the integrity of the outsourced data. However, these DV-PDP schemes are either inefficient or insecure under some circumstances. In this paper, we propose the first non-repudiable PDP scheme with designated verifier (DV-NRPDP) to address the non-repudiation issue and resolve possible disputations between users and cloud storage servers. We define the system model, framework and adversary model of DV-NRPDP schemes. Afterward, a concrete DV-NRPDP scheme is presented. Based on the computing discrete logarithm assumption, we formally prove that the proposed DV-NRPDP scheme is secure against several forgery attacks in the random oracle model. Comparisons with the previously proposed schemes are given to demonstrate the advantages of our scheme.
Phishers often exploit users' trust on the appearance of a site by using webpages that are visually similar to an authentic site. In the past, various research studies have tried to identify and classify the factors contributing towards the detection of phishing websites. The focus of this research is to establish a strong relationship between those identified heuristics (content-based) and the legitimacy of a website by analyzing training sets of websites (both phishing and legitimate websites) and in the process analyze new patterns and report findings. Many existing phishing detection tools are often not very accurate as they depend mostly on the old database of previously identified phishing websites. However, there are thousands of new phishing websites appearing every year targeting financial institutions, cloud storage/file hosting sites, government websites, and others. This paper presents a framework called Phishing-Detective that detects phishing websites based on existing and newly found heuristics. For this framework, a web crawler was developed to scrape the contents of phishing and legitimate websites. These contents were analyzed to rate the heuristics and their contribution scale factor towards the illegitimacy of a website. The data set collected from Web Scraper was then analyzed using a data mining tool to find patterns and report findings. A case study shows how this framework can be used to detect a phishing website. This research is still in progress but shows a new way of finding and using heuristics and the sum of their contributing weights to effectively and accurately detect phishing websites. Further development of this framework is discussed at the end of the paper.
Genetic Algorithms are group of mathematical models in computational science by exciting evolution in AI techniques nowadays. These algorithms preserve critical information by applying data structure with simple chromosome recombination operators by encoding solution to a specific problem. Genetic algorithms they are optimizer, in which range of problems applied to it are quite broad. Genetic Algorithms with its global search includes basic principles like selection, crossover and mutation. Data structures, algorithms and human brain inspiration are found for classification of data and for learning which works using Neural Networks. Artificial Intelligence (AI) it is a field, where so many tasks performed naturally by a human. When AI conventional methods are used in a computer it was proved as a complicated task. Applying Neural Networks techniques will create an internal structure of rules by which a program can learn by examples, to classify different inputs than mining techniques. This paper proposes a phishing websites classifier using improved polynomial neural networks in genetic algorithm.
Comparing with the traditional grid, energy internet will collect data widely and connect more broader. The analysis of electrical data use of Non-intrusive Load Monitoring (NILM) can infer user behavior privacy. Consideration both data security and availability is a problem must be addressed. Due to its rigid and provable privacy guarantee, Differential Privacy has proverbially reached and applied to privacy preserving data release and data mining. Because of its high sensitivity, increases the noise directly will led to data unavailable. In this paper, we propose a differentially private mechanism to protect energy internet privacy. Our focus is the aggregated data be released by data owner after added noise in disaggregated data. The theoretically proves and experiments show that our scheme can achieve the purpose of privacy-preserving and data availability.
With the repaid growth of social tagging users, it becomes very important for social tagging systems how the required resources are recommended to users rapidly and accurately. Firstly, the architecture of an agent-based intelligent social tagging system is constructed using agent technology. Secondly, the design and implementation of user interest mining, personalized recommendation and common preference group recommendation are presented. Finally, a self-adaptive recommendation strategy for social tagging and its implementation are proposed based on the analysis to the shortcoming of the personalized recommendation strategy and the common preference group recommendation strategy. The self-adaptive recommendation strategy achieves equilibrium selection between efficiency and accuracy, so that it solves the contradiction between efficiency and accuracy in the personalized recommendation model and the common preference recommendation model.
3D steganography is used in order to embed or hide information into 3D objects without causing visible or machine detectable modifications. In this paper we rethink about a high capacity 3D steganography based on the Hamiltonian path quantization, and increase its resistance to steganalysis. We analyze the parameters that may influence the distortion of a 3D shape as well as the resistance of the steganography to 3D steganalysis. According to the experimental results, the proposed high capacity 3D steganographic method has an increased resistance to steganalysis.
Steganography is the science of hiding information to send secret messages using the carrier object known as stego object. Steganographic technology is based on three principles including security, robustness and capacity. In this paper, we present a digital image hidden by using the compressive sensing technology to increase security of stego image based on human visual system features. The results represent which our proposed method provides higher security in comparison with the other presented methods. Bit Correction Rate between original secret message and extracted message is used to show the accuracy of this method.
Fog computing provides a new architecture for the implementation of the Internet of Things (IoT), which can connect sensor nodes to the cloud using the edge of the network. This structure has improved the latency and energy consumption in the cloud. In this heterogeneous and distributed environment, resource allocation is very important. Hence, scheduling will be a challenge to increase productivity and allocate resources appropriately to the tasks. Programs that run in this environment should be protected from intruders. We consider three parameters as authentication, integrity, and confidentiality to maintain security in fog devices. These parameters have time and computational overhead. In the proposed approach, we schedule the modules for the run in fog devices by heuristic algorithms based on data mining technique. The objective function is included CPU utilization, bandwidth, and security overhead. We compare the proposed algorithm with several heuristic algorithms. The results show that our proposed algorithm improved the average energy consumption of 63.27%, cost 44.71% relative to the PSO, ACO, SA algorithms.
Cloud Computing has many significant benefits like the provision of computing resources and virtual networks on demand. However, there is the problem to assure the security of these networks against Distributed Denial-of-Service (DDoS) attack. Over the past few decades, the development of protection method based on data mining has attracted many researchers because of its effectiveness and practical significance. Most commonly these detection methods use prelearned models or models based on rules. Because of this the proposed DDoS detection methods often failure in dynamically changing cloud virtual networks. In this paper, we purposed self-learning method allows to adapt a detection model to network changes. This is minimized the false detection and reduce the possibility to mark legitimate users as malicious and vice versa. The developed method consists of two steps: collecting data about the network traffic by Netflow protocol and relearning the detection model with the new data. During the data collection we separate the traffic on legitimate and malicious. The separated traffic is labeled and sent to the relearning pool. The detection model is relearned by a data from the pool of current traffic. The experiment results show that proposed method could increase efficiency of DDoS detection systems is using data mining.
In recent years, the usage of unmanned aircraft systems (UAS) for security-related purposes has increased, ranging from military applications to different areas of civil protection. The deployment of UAS can support security forces in achieving an enhanced situational awareness. However, in order to provide useful input to a situational picture, sensor data provided by UAS has to be integrated with information about the area and objects of interest from other sources. The aim of this study is to design a high-level data fusion component combining probabilistic information processing with logical and probabilistic reasoning, to support human operators in their situational awareness and improving their capabilities for making efficient and effective decisions. To this end, a fusion component based on the ISR (Intelligence, Surveillance and Reconnaissance) Analytics Architecture (ISR-AA) [1] is presented, incorporating an object-oriented world model (OOWM) for information integration, an expressive knowledge model and a reasoning component for detection of critical events. Approaches for translating the information contained in the OOWM into either an ontology for logical reasoning or a Markov logic network for probabilistic reasoning are presented.
Different data mining techniques are employed in stylometry domain for performing authorship attribution tasks. Sometimes to improve the decision system the discretization of input data can be applied. In many cases such approach allows to obtain better classification results. On the other hand, there were situations in which discretization decreased overall performance of the system. Therefore, the question arose what would be the result if only some selected attributes were discretized. The paper presents the results of the research performed for forward sequential selection of attributes to be discretized. The influence of such approach on the performance of the decision system, based on Naive Bayes classifier in authorship attribution domain, is presented. Some basic discretization methods and different approaches to discretization of the test datasets are taken into consideration.
Classifying users according to their behaviors is a complex problem due to the high-volume of data and the unclear association between distinct data points. Although over the past years behavioral researches has mainly focused on Massive Multiplayer Online Role Playing Games (MMORPG), such as World of Warcraft (WoW), which has predefined player classes, there has been little applied to Open World Sandbox Games (OWSG). Some OWSG do not have player classes or structured linear gameplay mechanics, as freedom is given to the player to freely wander and interact with the virtual world. This research focuses on identifying different play styles that exist within the non-structured gameplay sessions of OWSG. This paper uses the OWSG TUG as a case study and over a period of forty-five days, a database stored selected gameplay events happening on the research server. The study applied k-means clustering to this dataset and evaluated the resulting distinct behavioral profiles to classify player sessions on an open world sandbox game.
Cloud Computing represents one of the most significant shifts in information technology and it enables to provide cloud-based security service such as Security-as-a-service (SECaaS). Improving of the cloud computing technologies, the traditional SIEM paradigm is able to shift to cloud-based security services. In this paper, we propose the SIEM architecture that can be deployed to the SECaaS platform which we have been developing for analyzing and recognizing intelligent cyber-threat based on virtualization technologies.
The ability to discover patterns of interest in criminal networks can support and ease the investigation tasks by security and law enforcement agencies. By considering criminal networks as a special case of social networks, we can properly reuse most of the state-of-the-art techniques to discover patterns of interests, i.e., hidden and potential links. Nevertheless, in time-sensible scenarios, like the one involving criminal actions, the ability to discover patterns in a (near) real-time manner can be of primary importance.In this paper, we investigate the identification of patterns for link detection and prediction on an evolving criminal network. To extract valuable information as soon as data is generated, we exploit a stream processing approach. To this end, we also propose three new similarity social network metrics, specifically tailored for criminal link detection and prediction. Then, we develop a flexible data stream processing application relying on the Apache Flink framework; this solution allows us to deploy and evaluate the newly proposed metrics as well as the ones existing in literature. The experimental results show that the new metrics we propose can reach up to 83% accuracy in detection and 82% accuracy in prediction, resulting competitive with the state of the art metrics.