Biblio
In this work we propose a model for conducting efficient and mutually beneficial information sharing between two competing entities, focusing specifically on software vulnerability sharing. We extend the two-stage game-theoretic model proposed by Khouzani et al. [18] for bug sharing, addressing two key features: we allow security information to be associated with different categories and severities, but also remove a large proportion of player homogeneity assumptions the previous work makes. We then analyse how these added degrees of realism affect the trading dynamics of the game. Secondly, we develop a new private set operation (PSO) protocol that enables the removal of the trusted mediation requirement. The PSO functionality allows for bilateral trading between the two entities up to a mutually agreed threshold on the value of information shared, keeping all other input information secret. The protocol scales linearly with set sizes and we give an implementation that establishes the practicality of the design for varying input parameters. The resulting model and protocol provide a framework for practical and secure information sharing between competing entities.
We propose a game theoretic framework for task allocation in mobile cloud computing that corresponds to offloading of compute tasks to a group of nearby mobile devices. Specifically, in our framework, a distributor node holds a multidimensional auction for allocating the tasks of a job among nearby mobile nodes based on their computational capabilities and also the cost of computation at these nodes, with the goal of reducing the overall job completion time. Our proposed auction also has the desired incentive compatibility property that ensures that mobile devices truthfully reveal their capabilities and costs and that those devices benefit from the task allocation. To deal with node mobility, we perform multiple auctions over adaptive time intervals. We develop a heuristic approach to dynamically find the best time intervals between auctions to minimize unnecessary auctions and the accompanying overheads. We evaluate our framework and methods using both real world and synthetic mobility traces. Our evaluation results show that our game theoretic framework improves the job completion time by a factor of 2-5 in comparison to the time taken for executing the job locally, while minimizing the number of auctions and the accompanying overheads. Our approach is also profitable for the nearby nodes that execute the distributor's tasks with these nodes receiving a compensation higher than their actual costs.
As more and more cyber security incident data ranging from systems logs to vulnerability scan results are collected, manually analyzing these collected data to detect important cyber security events become impossible. Hence, data mining techniques are becoming an essential tool for real-world cyber security applications. For example, a report from Gartner [gartner12] claims that "Information security is becoming a big data analytics problem, where massive amounts of data will be correlated, analyzed and mined for meaningful patterns". Of course, data mining/analytics is a means to an end where the ultimate goal is to provide cyber security analysts with prioritized actionable insights derived from big data. This raises the question, can we directly apply existing techniques to cyber security applications? One of the most important differences between data mining for cyber security and many other data mining applications is the existence of malicious adversaries that continuously adapt their behavior to hide their actions and to make the data mining models ineffective. Unfortunately, traditional data mining techniques are insufficient to handle such adversarial problems directly. The adversaries adapt to the data miner's reactions, and data mining algorithms constructed based on a training dataset degrades quickly. To address these concerns, over the last couple of years new and novel data mining techniques which is more resilient to such adversarial behavior are being developed in machine learning and data mining community. We believe that lessons learned as a part of this research direction would be beneficial for cyber security researchers who are increasingly applying machine learning and data mining techniques in practice. To give an overview of recent developments in adversarial data mining, in this three hour long tutorial, we introduce the foundations, the techniques, and the applications of adversarial data mining to cyber security applications. We first introduce various approaches proposed in the past to defend against active adversaries, such as a minimax approach to minimize the worst case error through a zero-sum game. We then discuss a game theoretic framework to model the sequential actions of the adversary and the data miner, while both parties try to maximize their utilities. We also introduce a modified support vector machine method and a relevance vector machine method to defend against active adversaries. Intrusion detection and malware detection are two important application areas for adversarial data mining models that will be discussed in details during the tutorial. Finally, we discuss some practical guidelines on how to use adversarial data mining ideas in generic cyber security applications and how to leverage existing big data management tools for building data mining algorithms for cyber security.
Machine learning is widely used in security-sensitive settings like spam and malware detection, although it has been shown that malicious data can be carefully modified at test time to evade detection. To overcome this limitation, adversary-aware learning algorithms have been developed, exploiting robust optimization and game-theoretical models to incorporate knowledge of potential adversarial data manipulations into the learning algorithm. Despite these techniques have been shown to be effective in some adversarial learning tasks, their adoption in practice is hindered by different factors, including the difficulty of meeting specific theoretical requirements, the complexity of implementation, and scalability issues, in terms of computational time and space required during training. In this work, we aim to develop secure kernel machines against evasion attacks that are not computationally more demanding than their non-secure counterparts. In particular, leveraging recent work on robustness and regularization, we show that the security of a linear classifier can be drastically improved by selecting a proper regularizer, depending on the kind of evasion attack, as well as unbalancing the cost of classification errors. We then discuss the security of nonlinear kernel machines, and show that a proper choice of the kernel function is crucial. We also show that unbalancing the cost of classification errors and varying some kernel parameters can further improve classifier security, yielding decision functions that better enclose the legitimate data. Our results on spam and PDF malware detection corroborate our analysis.
Bitcoin provides two incentives for miners: block rewards and transaction fees. The former accounts for the vast majority of miner revenues at the beginning of the system, but it is expected to transition to the latter as the block rewards dwindle. There has been an implicit belief that whether miners are paid by block rewards or transaction fees does not affect the security of the block chain. We show that this is not the case. Our key insight is that with only transaction fees, the variance of the block reward is very high due to the exponentially distributed block arrival time, and it becomes attractive to fork a "wealthy" block to "steal" the rewards therein. We show that this results in an equilibrium with undesirable properties for Bitcoin's security and performance, and even non-equilibria in some circumstances. We also revisit selfish mining and show that it can be made profitable for a miner with an arbitrarily low hash power share, and who is arbitrarily poorly connected within the network. Our results are derived from theoretical analysis and confirmed by a new Bitcoin mining simulator that may be of independent interest. We discuss the troubling implications of our results for Bitcoin's future security and draw lessons for the design of new cryptocurrencies.
Decoy routing is a promising new approach for censorship circumvention that relies on traffic re-direction by volunteer autonomous systems. Decoy routing is subject to a fundamental censorship attack, called routing around decoy (RAD), in which the censors re-route their clients' Internet traffic in order to evade decoy routing autonomous systems. Recently, there has been a heated debate in the community on the real-world feasibility of decoy routing in the presence of the RAD attack. Unfortunately, previous studies rely their analysis on heuristic-based mechanisms for decoy placement strategies as well as ad hoc strategies for the implementation of the RAD attack by the censors. In this paper, we perform the first systematic analysis of decoy routing in the presence of the RAD attack. We use game theory to model the interactions between decoy router deployers and the censors in various settings. Our game-theoretic analysis finds the optimal decoy placement strategies–-as opposed to heuristic-based placements–-in the presence of RAD censors who take their optimal censorship actions–-as opposed to some ad hoc implementation of RAD. That is, we investigate the best decoy placement given the best RAD censorship. We consider two business models for the real-world deployment of decoy routers: a central deployment that resembles that of Tor and a distributed deployment where autonomous systems individually decide on decoy deployment based on their economic interests. Through extensive simulation of Internet routes, we derive the optimal strategies in the two models for various censoring countries and under different assumptions about the budget and preferences of the censors and decoy deployers. We believe that our study is a significant step forward in understanding the practicality of the decoy routing circumvention approach.
One of the main concerns for smartphone users is the quality of apps they download. Before installing any app from the market, users first check its rating and reviews. However, these ratings are not computed by experts and most times are not associated with malicious behavior. In this work, we present an IDS/rating system based on a game theoretic model with crowdsourcing. Our results show that, with minor control over the error in categorizing users and the fraction of experts in the crowd, our system provides proper ratings while flagging all malicious apps.
The security game is a basic model for resource allocation in adversarial environments. Here there are two players, a defender and an attacker. The defender wants to allocate her limited resources to defend critical targets and the attacker seeks his most favorable target to attack. In the past decade, there has been a surge of research interest in analyzing and solving security games that are motivated by applications from various domains. Remarkably, these models and their game-theoretic solutions have led to real-world deployments in use by major security agencies like the LAX airport, the US Coast Guard and Federal Air Marshal Service, as well as non-governmental organizations. Among all these research and applications, equilibrium computation serves as a foundation. This paper examines security games from a theoretical perspective and provides a unified view of various security game models. In particular, each security game can be characterized by a set system E which consists of the defender's pure strategies; The defender's best response problem can be viewed as a combinatorial optimization problem over E. Our framework captures most of the basic security game models in the literature, including all the deployed systems; The set system E arising from various domains encodes standard combinatorial problems like bipartite matching, maximum coverage, min-cost flow, packing problems, etc. Our main result shows that equilibrium computation in security games is essentially a combinatorial problem. In particular, we prove that, for any set system \$E\$, the following problems can be reduced to each other in polynomial time: (0) combinatorial optimization over E; (1) computing the minimax equilibrium for zero-sum security games over E; (2) computing the strong Stackelberg equilibrium for security games over E; (3) computing the best or worst (for the defender) Nash equilibrium for security games over E. Therefore, the hardness [polynomial solvability] of any of these problems implies the hardness [polynomial solvability] of all the others. Here, by "games over E" we mean the class of security games with arbitrary payoff structures, but a fixed set E of defender pure strategies. This shows that the complexity of a security game is essentially determined by the set system E. We view drawing these connections as an important conceptual contribution of this paper.
Distributed denial-of-service attacks are an increasing problem facing web applications, for which many defense techniques have been proposed, including several moving-target strategies. These strategies typically work by relocating targeted services over time, increasing uncertainty for the attacker, while trying not to disrupt legitimate users or incur excessive costs. Prior work has not shown, however, whether and how a rational defender would choose a moving-target method against an adaptive attacker, and under what conditions. We formulate a denial-of-service scenario as a two-player game, and solve a restricted-strategy version of the game using the methods of empirical game-theoretic analysis. Using agent-based simulation, we evaluate the performance of strategies from prior literature under a variety of attacks and environmental conditions. We find evidence for the strategic stability of various proposed strategies, such as proactive server movement, delayed attack timing, and suspected insider blocking, along with guidelines for when each is likely to be most effective.
Here we model the indirect costs of deploying security controls in small-to-medium enterprises (SMEs) to manage cyber threats. SMEs may not have the in-house skills and collective capacity to operate controls efficiently, resulting in inadvertent data leakage and exposure to compromise. Aside from financial costs, attempts to maintain security can impact morale, system performance, and retraining requirements, which are modelled here. Managing the overall complexity and effectiveness of an SME's security controls has the potential to reduce unintended leakage. The UK Cyber Essentials Scheme informs basic control definitions, and Available Responsibility Budget (ARB) is modelled to understand how controls can be prioritised for both security and usability. Human factors of security and practical experience of security management for SMEs inform the modelling of deployment challenges across a set of SME archetypes differing in size, complexity, and use of IT. Simple combinations of controls are matched to archetypes, balancing capabilities to protect data assets with the effort demands placed upon employees. Experiments indicate that two-factor authentication can be readily adopted by many SMEs and their employees to protect core assets, followed by correct access privileges and anti-malware software. Service and technology providers emerge as playing an important role in improving access to usable security controls for SMEs.
Modern Industrial Control Systems (ICS) rely on enterprise to plant floor connectivity. Where the size, diversity, and therefore complexity of ICS increase, operational requirements, goals, and challenges defined by users across various sub-systems follow. Recent trends in Information Technology (IT) and Operational Technology (OT) convergence may cause operators to lose a comprehensive understanding of end-to-end data flow requirements. This presents a risk to system security and resilience. Sensors were once solely applied for operational process use, but now act as inputs supporting a diverse set of organisational requirements. If these are not fully understood, incomplete risk assessment, and inappropriate implementation of security controls could occur. In search of a solution, operators may turn to standards and guidelines. This paper reviews popular standards and guidelines, prior to the presentation of a case study and conceptual tool, highlighting the importance of data flows, critical data processing points, and system-to-user relationships. The proposed approach forms a basis for risk assessment and security control implementation, aiding the evolution of ICS security and resilience.
This article focuses on the design of safe and attack-resilient Cyber-Physical Systems (CPS) equipped with multiple sensors measuring the same physical variable. A malicious attacker may be able to disrupt system performance through compromising a subset of these sensors. Consequently, we develop a precise and resilient sensor fusion algorithm that combines the data received from all sensors by taking into account their specified precisions. In particular, we note that in the presence of a shared bus, in which messages are broadcast to all nodes in the network, the attacker’s impact depends on what sensors he has seen before sending the corrupted measurements. Therefore, we explore the effects of communication schedules on the performance of sensor fusion and provide theoretical and experimental results advocating for the use of the Ascending schedule, which orders sensor transmissions according to their precision starting from the most precise. In addition, to improve the accuracy of the sensor fusion algorithm, we consider the dynamics of the system in order to incorporate past measurements at the current time. Possible ways of mapping sensor measurement history are investigated in the article and are compared in terms of the confidence in the final output of the sensor fusion. We show that the precision of the algorithm using history is never worse than the no-history one, while the benefits may be significant. Furthermore, we utilize the complementary properties of the two methods and show that their combination results in a more precise and resilient algorithm. Finally, we validate our approach in simulation and experiments on a real unmanned ground robot.
From pencils to commercial aircraft, every man-made object must be designed and manufactured. When it is cheaper or easier to steal a design or a manufacturing process specification than to invent one's own, the incentive for theft is present. As more and more manufacturing data comes online, incidents of such theft are increasing. In this paper, we present a side-channel attack on manufacturing equipment that reveals both the form of a product and its manufacturing process, i.e., exactly how it is made. In the attack, a human deliberately or accidentally places an attack-enabled phone close to the equipment or makes or receives a phone call on any phone nearby. The phone executing the attack records audio and, optionally, magnetometer data. We present a method of reconstructing the product's form and manufacturing process from the captured data, based on machine learning, signal processing, and human assistance. We demonstrate the attack on a 3D printer and a CNC mill, each with its own acoustic signature, and discuss the commonalities in the sensor data captured for these two different machines. We compare the quality of the data captured with a variety of smartphone models. Capturing data from the 3D printer, we reproduce the form and process information of objects previously unknown to the reconstructors. On average, our accuracy is within 1 mm in reconstructing the length of a line segment in a fabricated object's shape and within 1 degree in determining an angle in a fabricated object's shape. We conclude with recommendations for defending against these attacks.
The following decade will witness a surge in remote health-monitoring systems that are based on body-worn monitoring devices. These Medical Cyber Physical Systems (MCPS) will be capable of transmitting the acquired data to a private or public cloud for storage and processing. Machine learning algorithms running in the cloud and processing this data can provide decision support to healthcare professionals. There is no doubt that the security and privacy of the medical data is one of the most important concerns in designing an MCPS. In this paper, we depict the general architecture of an MCPS consisting of four layers: data acquisition, data aggregation, cloud processing, and action. Due to the differences in hardware and communication capabilities of each layer, different encryption schemes must be used to guarantee data privacy within that layer. We survey conventional and emerging encryption schemes based on their ability to provide secure storage, data sharing, and secure computation. Our detailed experimental evaluation of each scheme shows that while the emerging encryption schemes enable exciting new features such as secure sharing and secure computation, they introduce several orders-of-magnitude computational and storage overhead. We conclude our paper by outlining future research directions to improve the usability of the emerging encryption schemes in an MCPS.
We examine the security of home smart locks: cyber-physical devices that replace traditional door locks with deadbolts that can be electronically controlled by mobile devices or the lock manufacturer's remote servers. We present two categories of attacks against smart locks and analyze the security of five commercially-available locks with respect to these attacks. Our security analysis reveals that flaws in the design, implementation, and interaction models of existing locks can be exploited by several classes of adversaries, allowing them to learn private information about users and gain unauthorized home access. To guide future development of smart locks and similar Internet of Things devices, we propose several defenses that mitigate the attacks we present. One of these defenses is a novel approach to securely and usably communicate a user's intended actions to smart locks, which we prototype and evaluate. Ultimately, our work takes a first step towards illuminating security challenges in the system design and novel functionality introduced by emerging IoT systems.
In this research paper, we present a function-based methodology to evaluate the resilience of gas pipeline systems under two different cyber-physical attack scenarios. The first attack scenario is the pressure integrity attack on the natural gas high-pressure transmission pipeline. Through simulations, we have analyzed the cyber attacks that propagate from cyber to the gas pipeline physical domain, the time before which the SCADA system should respond to such attacks, and finally, an attack which prevents the response of the system. We have used the combined results of simulations of a wireless mesh network for remote terminal units and of a gas pipeline simulation to measure the shortest Time to Criticality (TTC) parameter; the time for an event to reach the failure state. The second attack scenario describes how a failure of a cyber node controlling power grid functionality propagates from cyber to power to gas pipeline systems. We formulate this problem using a graph-theoretic approach and quantify the resilience of the networks by percentage of connected nodes and the length of the shortest path between them. The results show that parameters such as TTC, power distribution capacity of the power grid nodes and percentage of the type of cyber nodes compromised, regulate the efficiency and resilience of the power and gas networks. The analysis of such attack scenarios helps the gas pipeline system administrators design attack remediation algorithms and improve the response of the system to an attack.
Multipath TCP (MP-TCP) has the potential to greatly improve application performance by using multiple paths transparently. We propose a fluid model for a large class of MP-TCP algorithms and identify design criteria that guarantee the existence, uniqueness, and stability of system equilibrium. We clarify how algorithm parameters impact TCP-friendliness, responsiveness, and window oscillation and demonstrate an inevitable tradeoff among these properties. We discuss the implications of these properties on the behavior of existing algorithms and motivate our algorithm Balia (balanced linked adaptation), which generalizes existing algorithms and strikes a good balance among TCP-friendliness, responsiveness, and window oscillation. We have implemented Balia in the Linux kernel. We use our prototype to compare the new algorithm to existing MP-TCP algorithms.
The phenomenon of metal-insulator-transition (MIT) in strongly correlated oxides, such as NbO2, have shown the oscillation behavior in recent experiments. In this work, the MIT based two-terminal device is proposed as a compact oscillation neuron for the parallel read operation from the resistive synaptic array. The weighted sum is represented by the frequency of the oscillation neuron. Compared to the complex CMOS integrate-and-fire neuron with tens of transistors, the oscillation neuron achieves significant area reduction, thereby alleviating the column pitch matching problem of the peripheral circuitry in resistive memories. Firstly, the impact of MIT device characteristics on the weighted sum accuracy is investigated when the oscillation neuron is connected to a single resistive synaptic device. Secondly, the array-level performance is explored when the oscillation neurons are connected to the resistive synaptic array. To address the interference of oscillation between columns in simple cross-point arrays, a 2-transistor-1-resistor (2T1R) array architecture is proposed at negligible increase in array area. Finally, the circuit-level benchmark of the proposed oscillation neuron with the CMOS neuron is performed. At single neuron node level, oscillation neuron shows textgreater12.5X reduction of area. At 128×128 array level, oscillation neuron shows a reduction of ˜4% total area, textgreater30% latency, ˜5X energy and ˜40X leakage power, demonstrating its advantage of being integrated into the resistive synaptic array for neuro-inspired computing.
We introduce a simplified island model with behavior similar to the λ (1+1) islands optimizing the Maze fitness function, and investigate the effects of the migration topology on the ability of the simplified island model to track the optimum of a dynamic fitness function. More specifically, we prove that there exist choices of model parameters for which using a unidirectional ring as the migration topology allows the model to track the oscillating optimum through n Maze-like phases with high probability, while using a complete graph as the migration topology results in the island model losing track of the optimum with overwhelming probability. Additionally, we prove that if migration occurs only rarely, denser migration topologies may be advantageous. This serves to illustrate that while a less-dense migration topology may be useful when optimizing dynamic functions with oscillating behavior, and requires less problem-specific knowledge to determine when migration may be allowed to occur, care must be taken to ensure that a sufficient amount of migration occurs during the optimization process.
Fabrication process introduces some inherent variability to the attributes of transistors (in particular length, widths, oxide thickness). As a result, every chip is physically unique. Physical uniqueness of microelectronics components can be used for multiple security applications. Physically Unclonable Functions (PUFs) are built to extract the physical uniqueness of microelectronics components and make it usable for secure applications. However, the microelectronics components used by PUFs designs suffer from external, environmental variations that impact the PUF behavior. Variations of temperature gradients during manufacturing can bias the PUF responses. Variations of temperature or thermal noise during PUF operation change the behavior of the circuit, and can introduce errors in PUF responses. Detailed knowledge of the behavior of PUFs operating over various environmental factors is needed to reliably extract and demonstrate uniqueness of the chips. In this work, we present a detailed and exhaustive analysis of the behavior of two PUF designs, a ring oscillator PUF and a timing path violation PUF. We have implemented both PUFs using FPGA fabricated by Xilinx, and analyzed their behavior while varying temperature and supply voltage. Our experiments quantify the robustness of each design, demonstrate their sensitivity to temperature and show the impact which supply voltage has on the uniqueness of the analyzed PUFs.
Modern information extraction pipelines are typically constructed by (1) loading textual data from a database into a special-purpose application, (2) applying a myriad of text-analytics functions to the text, which produce a structured relational table, and (3) storing this table in a database. Obviously, this approach can lead to laborious development processes, complex and tangled programs, and inefficient control flows. Towards solving these deficiencies, we embark on an effort to lay the foundations of a new generation of text-centric database management systems. Concretely, we extend the relational model by incorporating into it the theory of document spanners which provides the means and methods for the model to engage the Information Extraction (IE) tasks. This extended model, called Spannerlog, provides a novel declarative method for defining and manipulating textual data, which makes possible the automation of the typical work method described above. In addition to formally defining Spannerlog and illustrating its usefulness for IE tasks, we also report on initial results concerning its expressive power.
In many domains, a plethora of textual information is available on the web as news reports, blog posts, community portals, etc. Information extraction (IE) is the default technique to turn unstructured text into structured fact databases, but systematically applying IE techniques to web input requires highly complex systems, starting from focused crawlers over quality assurance methods to cope with the HTML input to long pipelines of natural language processing and IE algorithms. Although a number of tools for each of these steps exists, their seamless, flexible, and scalable combination into a web scale end-to-end text analytics system still is a true challenge. In this paper, we report our experiences from building such a system for comparing the "web view" on health related topics with that derived from a controlled scientific corpus, i.e., Medline. The system combines a focused crawler, applying shallow text analysis and classification to maintain focus, with a sophisticated text analytic engine inside the Big Data processing system Stratosphere. We describe a practical approach to seed generation which led us crawl a corpus of \textasciitilde1 TB web pages highly enriched for the biomedical domain. Pages were run through a complex pipeline of best-of-breed tools for a multitude of necessary tasks, such as HTML repair, boilerplate detection, sentence detection, linguistic annotation, parsing, and eventually named entity recognition for several types of entities. Results are compared with those from running the same pipeline (without the web-related tasks) on a corpus of 24 million scientific abstracts and a third corpus made of \textasciitilde250K scientific full texts. We evaluate scalability, quality, and robustness of the employed methods and tools. The focus of this paper is to provide a large, real-life use case to inspire future research into robust, easy-to-use, and scalable methods for domain-specific IE at web scale.
Text mining has developed and emerged as an essential tool for revealing the hidden value in the data. Text mining is an emerging technique for companies around the world and suitable for large enduring analyses and discrete investigations. Since there is a need to track disrupting technologies, explore internal knowledge bases or review enormous data sets. Most of the information produced due to conversation transcripts is an unstructured format. These data have ambiguity, redundancy, duplications, typological errors and many more. The processing and analysis of these unstructured data are difficult task. But, there are several techniques in text mining are available to extract keywords from these unstructured conversation transcripts. Keyword Extraction is the process of examining the most significant word in the context which helps to take decisions in a much faster manner. The main objective of the proposed work is extracting the keywords from meeting transcripts by using the Swarm Intelligence (SI) techniques. Here Stochastic Diffusion Search (SDS) algorithm is used for keyword extraction and Firefly algorithm used for clustering. These techniques will be implemented for an extensive range of optimization problems and produced better results when compared with existing technique.
Many books and papers describe how to do data science. While those texts are useful, it can also be important to reflect on anti-patterns; i.e. common classes of errors seen when large communities of researchers and commercial software engineers use, and misuse data mining tools. This technical briefing will present those errors and show how to avoid them.
The web world is been flooded with multi-media sources such as images, videos, animations and audios, which has in turn made the computer vision researchers to focus over extracting the content from the sources. Scene text recognition basically involves two major steps namely Text Localization and Text Recognition. This paper provides end-to-end text recognition approach to extract the characters alone from the complex natural scene. Using Maximal Stable Extremal Region (MSER) the various objects are localized, using Canny Edge detection method edges are identified, further binary classification is done using Connected-Component method which segregates the text and nontext objects and finally the stroke analysis method is applied to analyse the style of the character, leading to the character recognization. The Experimental results were obtained by testing the approach over ICDAR2015 dataset, wherein text was able to be recognized from most of the scene images with good precision value.