Biblio
We present a novel, and use case agnostic method of identifying and circumventing private data exposure across distributed and high-dimensional data repositories. Examples of distributed high-dimensional data repositories include medical research and treatment data, where oftentimes more than 300 describing attributes appear. As such, providing strong guarantees of data anonymity in these repositories is a hard constraint in adhering to privacy legislation. Yet, when applied to distributed high-dimensional data, existing anonymisation algorithms incur high levels of information loss and do not guarantee privacy defeating the purpose of anonymisation. In this paper, we address this issue by using Bayesian networks to handle data transformation for anonymisation. By evaluating every attribute combination to determine the privacy exposure risk, the conditional probability linking attribute pairs is computed. Pairs with a high conditional probability expose the risk of deanonymisation similar to quasi-identifiers and can be separated instead of deleted, as in previous algorithms. Attribute separation removes the risk of privacy exposure, and deletion avoidance results in a significant reduction in information loss. In other words, assimilating the conditional probability of outliers directly in the adjacency matrix in a greedy fashion is quick and thwarts de-anonymisation. Since identifying every privacy violating attribute combination is a W[2]-complete problem, we optimise the procedure with a multigrid solver method by evaluating the conditional probabilities between attribute pairs, and aggregating state space explosion of attribute pairs through manifold learning. Finally, incremental processing of new data is achieved through inexpensive, continuous (delta) learning.
Cyber defense can no longer be limited to intrusion detection methods. These systems require malicious activity to enter an internal network before an attack can be detected. Having advanced, predictive knowledge of future attacks allow a potential victim to heighten security and possibly prevent any malicious traffic from breaching the network. This paper investigates the use of Auto-Regressive Integrated Moving Average (ARIMA) models and Bayesian Networks (BN) to predict future cyber attack occurrences and intensities against two target entities. In addition to incident count forecasting, categorical and binary occurrence metrics are proposed to better represent volume forecasts to a victim. Different measurement periods are used in time series construction to better model the temporal patterns unique to each attack type and target configuration, seeing over 86% improvement over baseline forecasts. Using ground truth aggregated over different measurement periods as signals, a BN is trained and tested for each attack type and the obtained results provided further evidence to support the findings from ARIMA. This work highlights the complexity of cyber attack occurrences; each subset has unique characteristics and is influenced by a number of potential external factors.
Robotic vehicles and especially autonomous robotic vehicles can be attractive targets for attacks that cross the cyber-physical divide, that is cyber attacks or sensory channel attacks affecting the ability to navigate or complete a mission. Detection of such threats is typically limited to knowledge-based and vehicle-specific methods, which are applicable to only specific known attacks, or methods that require computation power that is prohibitive for resource-constrained vehicles. Here, we present a method based on Bayesian Networks that can not only tell whether an autonomous vehicle is under attack, but also whether the attack has originated from the cyber or the physical domain. We demonstrate the feasibility of the approach on an autonomous robotic vehicle built in accordance with the Generic Vehicle Architecture specification and equipped with a variety of popular communication and sensing technologies. The results of experiments involving command injection, rogue node and magnetic interference attacks show that the approach is promising.
Attack graphs provide compact representations of the attack paths an attacker can follow to compromise network resources from the analysis of network vulnerabilities and topology. These representations are a powerful tool for security risk assessment. Bayesian inference on attack graphs enables the estimation of the risk of compromise to the system's components given their vulnerabilities and interconnections and accounts for multi-step attacks spreading through the system. While static analysis considers the risk posture at rest, dynamic analysis also accounts for evidence of compromise, for example, from Security Information and Event Management software or forensic investigation. However, in this context, exact Bayesian inference techniques do not scale well. In this article, we show how Loopy Belief Propagation—an approximate inference technique—can be applied to attack graphs and that it scales linearly in the number of nodes for both static and dynamic analysis, making such analyses viable for larger networks. We experiment with different topologies and network clustering on synthetic Bayesian attack graphs with thousands of nodes to show that the algorithm's accuracy is acceptable and that it converges to a stable solution. We compare sequential and parallel versions of Loopy Belief Propagation with exact inference techniques for both static and dynamic analysis, showing the advantages and gains of approximate inference techniques when scaling to larger attack graphs.
In this paper, we introduce Entropy/IP: a system that discovers Internet address structure based on analyses of a subset of IPv6 addresses known to be active, i.e., training data, gleaned by readily available passive and active means. The system is completely automated and employs a combination of information-theoretic and machine learning techniques to probabilistically model IPv6 addresses. We present results showing that our system is effective in exposing structural characteristics of portions of the active IPv6 Internet address space, populated by clients, services, and routers. In addition to visualizing the address structure for exploration, the system uses its models to generate candidate addresses for scanning. For each of 15 evaluated datasets, we train on 1K addresses and generate 1M candidates for scanning. We achieve some success in 14 datasets, finding up to 40% of the generated addresses to be active. In 11 of these datasets, we find active network identifiers (e.g., /64 prefixes or "subnets") not seen in training. Thus, we provide the first evidence that it is practical to discover subnets and hosts by scanning probabilistically selected areas of the IPv6 address space not known to contain active hosts a priori.
Dynamic firewalls with stateful inspection have added a lot of security features over the stateless traditional static filters. Dynamic firewalls need to be adaptive. In this paper, we have designed a framework for dynamic firewalls based on probabilistic ontology using Multi Entity Bayesian Networks (MEBN) logic. MEBN extends ordinary Bayesian networks to allow representation of graphical models with repeated substructures and can express a probability distribution over models of any consistent first order theory. The motivation of our proposed work is about preventing novel attacks (i.e. those attacks for which no signatures have been generated yet). The proposed framework is in two important parts: first part is the data flow architecture which extracts important connection based features with the prime goal of an explicit rule inclusion into the rule base of the firewall; second part is the knowledge flow architecture which uses semantic threat graph as well as reasoning under uncertainty to fulfill the required objective of providing futuristic threat prevention technique in dynamic firewalls.
This paper presents the application of fusion meth- ods to a visual surveillance scenario. The range of relevant features for re-identifying vehicles is discussed, along with the methods for fusing probabilistic estimates derived from these estimates. In particular, two statistical parametric fusion methods are considered: Bayesian Networks and the Dempster Shafer approach. The main contribution of this paper is the development of a metric to allow direct comparison of the benefits of the two methods. This is achieved by generalising the Kelly betting strategy to accommodate a variable total stake for each sample, subject to a fixed expected (mean) stake. This metric provides a method to quantify the extra information provided by the Dempster-Shafer method, in comparison to a Bayesian Fusion approach.