Biblio
Safety and security of complex critical infrastructures is very important for economic, environmental and social reasons. The interdisciplinary and inter-system dependencies within these infrastructures introduce difficulties in the safety and security design. Late discovery of safety and security design weaknesses can lead to increased costs, additional system complexity, ineffective mitigation measures and delays to the deployment of the systems. Traditionally, safety and security assessments are handled using different methods and tools, although some concepts are very similar, by specialized experts in different disciplines and are performed at different system design life-cycle phases.The methodology proposed in this paper supports a concurrent safety and security Defense in Depth (DiD) assessment at an early design phase and it is designed to handle safety and security at a high level and not focus on specific practical technologies. It is assumed that regardless of the perceived level of security defenses in place, a determined (motivated, capable and/or well-funded) attacker can find a way to penetrate a layer of defense. While traditional security research focuses on removing vulnerabilities and increasing the difficulty to exploit weaknesses, our higher-level approach focuses on how the attacker's reach can be limited and to increase the system's capability for detection, identification, mitigation and tracking. The proposed method can assess basic safety and security DiD design principles like Redundancy, Physical separation, Functional isolation, Facility functions, Diversity, Defense lines/Facility and Computer Security zones, Safety classes/Security Levels, Safety divisions and physical gates/conduits (as defined by the International Atomic Energy Agency (IAEA) and international standards) concurrently and provide early feedback to the system engineer. A prototype tool is developed that can parse the exported project file of the interdisciplinary model. Based on a set of safety and security attributes, the tool is able to assess aspects of the safety and security DiD capabilities of the design. Its results can be used to identify errors, improve the design and cut costs before a formal human expert inspection. The tool is demonstrated on a case study of an early conceptual design of a complex system of a nuclear power plant.
Resource scheduling in a computing system addresses the problem of packing tasks with multi-dimensional resource requirements and non-functional constraints. The exhibited heterogeneity of workload and server characteristics in Cloud-scale or Internet-scale systems is adding further complexity and new challenges to the problem. Compared with,,,, existing solutions based on ad-hoc heuristics, Machine Learning (ML) has the potential to improve further the efficiency of resource management in large-scale systems. In this paper we,,,, will describe and discuss how ML could be used to understand automatically both workloads and environments, and to help to cope with scheduling-related challenges such as consolidating co-located workloads, handling resource requests, guaranteeing application's QoSs, and mitigating tailed stragglers. We will introduce a generalized ML-based solution to large-scale resource scheduling and demonstrate its effectiveness through a case study that deals with performance-centric node classification and straggler mitigation. We believe that an MLbased method will help to achieve architectural optimization and efficiency improvement.
In this paper, we propose a compositional scheme for the construction of abstractions for networks of control systems by using the interconnection matrix and joint dissipativity-type properties of subsystems and their abstractions. In the proposed framework, the abstraction, itself a control system (possibly with a lower dimension), can be used as a substitution of the original system in the controller design process. Moreover, we provide a procedure for constructing abstractions of a class of nonlinear control systems by using the bounds on the slope of system nonlinearities. We illustrate the proposed results on a network of linear control systems by constructing its abstraction in a compositional way without requiring any condition on the number or gains of the subsystems. We use the abstraction as a substitute to synthesize a controller enforcing a certain linear temporal logic specification. This example particularly elucidates the effectiveness of dissipativity-type compositional reasoning for large-scale systems.
Cyber-physical systems (CPS) research leverages the expertise of researchers from multiple domains to engineer complex systems of interacting physical and computational components. An approach called co-simulation is often used in CPS conceptual design to integrate the specialized tools and simulators from each of these domains into a joint simulation for the evaluation of design decisions. Many co-simulation platforms are being developed to expedite CPS conceptualization and realization, but most use intrusive modeling and communication libraries that require researchers to either abandon their existing models or spend considerable effort to integrate them into the platform. A significant number of these co-simulation platforms use the High Level Architecture (HLA) standard that provides a rich set of services to facilitate distributed simulation. This paper introduces a simple gateway that can be readily implemented without co-simulation expertise to adapt existing models and research infrastructure for use in HLA. An open-source implementation of the gateway has been developed for the National Institute of Standards and Technology (NIST) co-simulation platform called the Universal CPS Environment for Federation (UCEF).
To manage cybersecurity risks in practice, a simple yet effective method to assess suchs risks for individual systems is needed. With time-to-compromise (TTC), McQueen et al. (2005) introduced such a metric that measures the expected time that a system remains uncompromised given a specific threat landscape. Unlike other approaches that require complex system modeling to proceed, TTC combines simplicity with expressiveness and therefore has evolved into one of the most successful cybersecurity metrics in practice. We revisit TTC and identify several mathematical and methodological shortcomings which we address by embedding all aspects of the metric into the continuous domain and the possibility to incorporate information about vulnerability characteristics and other cyber threat intelligence into the model. We propose $\beta$-TTC, a formal extension of TTC which includes information from CVSS vectors as well as a continuous attacker skill based on a $\beta$-distribution. We show that our new metric (1) remains simple enough for practical use and (2) gives more realistic predictions than the original TTC by using data from a modern and productively used vulnerability database of a national CERT.
To manage cybersecurity risks in practice, a simple yet effective method to assess suchs risks for individual systems is needed. With time-to-compromise (TTC), McQueen et al. (2005) introduced such a metric that measures the expected time that a system remains uncompromised given a specific threat landscape. Unlike other approaches that require complex system modeling to proceed, TTC combines simplicity with expressiveness and therefore has evolved into one of the most successful cybersecurity metrics in practice. We revisit TTC and identify several mathematical and methodological shortcomings which we address by embedding all aspects of the metric into the continuous domain and the possibility to incorporate information about vulnerability characteristics and other cyber threat intelligence into the model. We propose β-TTC, a formal extension of TTC which includes information from CVSS vectors as well as a continuous attacker skill based on a β-distribution. We show that our new metric (1) remains simple enough for practical use and (2) gives more realistic predictions than the original TTC by using data from a modern and productively used vulnerability database of a national CERT.
Poison message failure is a mechanism that has been responsible for large scale failures in both telecommunications and IP networks. The poison message failure can propagate in the network and cause an unstable network. We apply a machine learning, data mining technique in the network fault management area. We use the k-nearest neighbor method to identity the poison message failure. We also propose a "probabilistic" k-nearest neighbor method which outputs a probability distribution about the poison message. Through extensive simulations, we show that the k-nearest neighbor method is very effective in identifying the responsible message type.
With the ubiquitous computing of providing services and applications at anywhere and anytime, cloud computing is the best option as it offers flexible and pay-per-use based services to its customers. Nevertheless, security and privacy are the main challenges to its success due to its dynamic and distributed architecture, resulting in generating big data that should be carefully analysed for detecting network's vulnerabilities. In this paper, we propose a Collaborative Anomaly Detection Framework (CADF) for detecting cyber attacks from cloud computing environments. We provide the technical functions and deployment of the framework to illustrate its methodology of implementation and installation. The framework is evaluated on the UNSW-NB15 dataset to check its credibility while deploying it in cloud computing environments. The experimental results showed that this framework can easily handle large-scale systems as its implementation requires only estimating statistical measures from network observations. Moreover, the evaluation performance of the framework outperforms three state-of-the-art techniques in terms of false positive rate and detection rate.
At the core of the "Big Data" revolution lie frameworks and systems that allow for the massively parallel processing of large amounts of data. Ironically, while they have been designed for processing large amounts of data, these systems are at the same time major producers of data: to support the administration and management of these huge-scale systems, they are configured to generate detailed log and monitoring data, periodically capturing the system state across all nodes, components and jobs in the system. While such logging information is used routinely by sysadmins for ad-hoc trouble-shooting and problem diagnosis, we point out that there is a tremendous value in analyzing such data from a research point of view. In this talk, we will go over several case studies that demonstrate how measuring and analyzing measurement data from production systems can provide new insights into how systems work and fail, and how these new insights can help in designing better systems.
Decreasing the potential for catastrophic consequences poses a significant challenge for high-risk industries. Organizations are under many different pressures, and they are continuously trying to adapt to changing conditions and recover from disturbances and stresses that can arise from both normal operations and unexpected events. Reducing risks in complex systems therefore requires that organizations develop and enhance traits that increase resilience. Resilience provides a holistic approach to safety, emphasizing the creation of organizations and systems that are proactive, interactive, reactive, and adaptive. This approach relies on disciplines such as system safety and emergency management, but also requires that organizations develop indicators and ways of knowing when an emergency is imminent. A resilient organization must be adaptive, using hands-on activities and lessons learned efforts to better prepare it to respond to future disruptions. It is evident from the discussions of each of the traits of resilience, including their limitations, that there are no easy answers to reducing safety risks in complex systems. However, efforts to strengthen resilience may help organizations better address the challenges associated with the ever-increasing complexities of their systems.
In large-scale systems, user authentication usually needs the assistance from a remote central authentication server via networks. The authentication service however could be slow or unavailable due to natural disasters or various cyber attacks on communication channels. This has raised serious concerns in systems which need robust authentication in emergency situations. The contribution of this paper is two-fold. In a slow connection situation, we present a secure generic multi-factor authentication protocol to speed up the whole authentication process. Compared with another generic protocol in the literature, the new proposal provides the same function with significant improvements in computation and communication. Another authentication mechanism, which we name stand-alone authentication, can authenticate users when the connection to the central server is down. We investigate several issues in stand-alone authentication and show how to add it on multi-factor authentication protocols in an efficient and generic way.
In this paper, parallelization and high performance computing are utilized to enable ultrafast transient stability analysis that can be used in a real-time environment to quickly perform “what-if” simulations involving system dynamics phenomena. EPRI's Extended Transient Midterm Simulation Program (ETMSP) is modified and enhanced for this work. The contingency analysis is scaled for large-scale contingency analysis using Message Passing Interface (MPI) based parallelization. Simulations of thousands of contingencies on a high performance computing machine are performed, and results show that parallelization over contingencies with MPI provides good scalability and computational gains. Different ways to reduce the Input/Output (I/O) bottleneck are explored, and findings indicate that architecting a machine with a larger local disk and maintaining a local file system significantly improve the scaling results. Thread-parallelization of the sparse linear solve is explored also through use of the SuperLU_MT library.
Captchas are designed to be easy for humans but hard for machines. However, most recent research has focused only on making them hard for machines. In this paper, we present what is to the best of our knowledge the first large scale evaluation of captchas from the human perspective, with the goal of assessing how much friction captchas present to the average user. For the purpose of this study we have asked workers from Amazon’s Mechanical Turk and an underground captchabreaking service to solve more than 318 000 captchas issued from the 21 most popular captcha schemes (13 images schemes and 8 audio scheme). Analysis of the resulting data reveals that captchas are often difficult for humans, with audio captchas being particularly problematic. We also find some demographic trends indicating, for example, that non-native speakers of English are slower in general and less accurate on English-centric captcha schemes. Evidence from a week’s worth of eBay captchas (14,000,000 samples) suggests that the solving accuracies found in our study are close to real-world values, and that improving audio captchas should become a priority, as nearly 1% of all captchas are delivered as audio rather than images. Finally our study also reveals that it is more effective for an attacker to use Mechanical Turk to solve captchas than an underground service.