Visible to the public Biblio

Filters: Keyword is checkpointing  [Clear All Filters]
2022-03-22
Akowuah, Francis, Prasad, Romesh, Espinoza, Carlos Omar, Kong, Fanxin.  2021.  Recovery-by-Learning: Restoring Autonomous Cyber-physical Systems from Sensor Attacks. 2021 IEEE 27th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA). :61—66.
Autonomous cyber-physical systems (CPS) are susceptible to non-invasive physical attacks such as sensor spoofing attacks that are beyond the classical cybersecurity domain. These attacks have motivated numerous research efforts on attack detection, but little attention on what to do after detecting an attack. The importance of attack recovery is emphasized by the need to mitigate the attack’s impact on a system and restore it to continue functioning. There are only a few works addressing attack recovery, but they all rely on prior knowledge of system dynamics. To overcome this limitation, we propose Recovery-by-Learning, a data-driven attack recovery framework that restores CPS from sensor attacks. The framework leverages natural redundancy among heterogeneous sensors and historical data for attack recovery. Specially, the framework consists of two major components: state predictor and data checkpointer. First, the predictor is triggered to estimate systems states after the detection of an attack. We propose a deep learning-based prediction model that exploits the temporal correlation among heterogeneous sensors. Second, the checkpointer executes when no attack is detected. We propose a double sliding window based checkpointing protocol to remove compromised data and keep trustful data as input to the state predictor. Third, we implement and evaluate the effectiveness of our framework using a realistic data set and a ground vehicle simulator. The results show that our method restores a system to continue functioning in presence of sensor attacks.
2022-03-14
Nath, Shubha Brata, Addya, Sourav Kanti, Chakraborty, Sandip, Ghosh, Soumya K.  2021.  Container-based Service State Management in Cloud Computing. 2021 IFIP/IEEE International Symposium on Integrated Network Management (IM). :487—493.
In a cloud data center, the client requests are catered by placing the services in its servers. Such services are deployed through a sandboxing platform to ensure proper isolation among services from different users. Due to the lightweight nature, containers have become increasingly popular to support such sandboxing. However, for supporting effective and efficient data center resource usage with minimum resource footprints, improving the containers' consolidation ratio is significant for the cloud service providers. Towards this end, in this paper, we propose an exciting direction to significantly boost up the consolidation ratio of a data-center environment by effectively managing the containers' states. We observe that many cloud-based application services are event-triggered, so they remain inactive unless some external service request comes. We exploit the fact that the containers remain in an idle state when the underlying service is not active, and thus such idle containers can be checkpointed unless an external service request comes. However, the challenge here is to design an efficient mechanism such that an idle container can be resumed quickly to prevent the loss of the application's quality of service (QoS). We have implemented the system, and the evaluation is performed in Amazon Elastic Compute Cloud. The experimental results have shown that the proposed algorithm can manage the containers' states, ensuring the increase of consolidation ratio.
2020-02-26
Gountia, Debasis, Roy, Sudip.  2019.  Checkpoints Assignment on Cyber-Physical Digital Microfluidic Biochips for Early Detection of Hardware Trojans. 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI). :16–21.

Present security study involving analysis of manipulation of individual droplets of samples and reagents by digital microfluidic biochip has remarked that the biochip design flow is vulnerable to piracy attacks, hardware Trojans attacks, overproduction, Denial-of-Service attacks, and counterfeiting. Attackers can introduce bioprotocol manipulation attacks against biochips used for medical diagnosis, biochemical analysis, and frequent diseases detection in healthcare industry. Among these attacks, hardware Trojans have created a major threatening issue in its security concern with multiple ways to crack the sensitive data or alter original functionality by doing malicious operations in biochips. In this paper, we present a systematic algorithm for the assignment of checkpoints required for error-recovery of available bioprotocols in case of hardware Trojans attacks in performing operations by biochip. Moreover, it can guide the placement and timing of checkpoints so that the result of an attack is reduced, and hence enhance the security concerns of digital microfluidic biochips. Comparative study with traditional checkpoint schemes demonstrate the superiority of the proposed algorithm without overhead of the bioprotocol completion time with higher error detection accuracy.

2019-02-14
Kong, F., Xu, M., Weimer, J., Sokolsky, O., Lee, I..  2018.  Cyber-Physical System Checkpointing and Recovery. 2018 ACM/IEEE 9th International Conference on Cyber-Physical Systems (ICCPS). :22-31.

Transitioning to more open architectures has been making Cyber-Physical Systems (CPS) vulnerable to malicious attacks that are beyond the conventional cyber attacks. This paper studies attack-resilience enhancement for a system under emerging attacks in the environment of the controller. An effective way to address this problem is to make system state estimation accurate enough for control regardless of the compromised components. This work follows this way and develops a procedure named CPS checkpointing and recovery, which leverages historical data to recover failed system states. Specially, we first propose a new concept of physical-state recovery. The essential operation is defined as rolling the system forward starting from a consistent historical system state. Second, we design a checkpointing protocol that defines how to record system states for the recovery. The protocol introduces a sliding window that accommodates attack-detection delay to improve the correctness of stored states. Third, we present a use case of CPS checkpointing and recovery that deals with compromised sensor measurements. At last, we evaluate our design through conducting simulator-based experiments and illustrating the use of our design with an unmanned vehicle case study.

2018-05-09
Shin, S., Tuck, J., Solihin, Y..  2017.  Hiding the Long Latency of Persist Barriers Using Speculative Execution. 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). :175–186.

Byte-addressable non-volatile memory technology is emerging as an alternative for DRAM for main memory. This new Non-Volatile Main Memory (NVMM) allows programmers to store important data in data structures in memory instead of serializing it to the file system, thereby providing a substantial performance boost. However, modern systems reorder memory operations and utilize volatile caches for better performance, making it difficult to ensure a consistent state in NVMM. Intel recently announced a new set of persistence instructions, clflushopt, clwb, and pcommit. These new instructions make it possible to implement fail-safe code on NVMM, but few workloads have been written or characterized using these new instructions. In this work, we describe how these instructions work and how they can be used to implement write-ahead logging based transactions. We implement several common data structures and kernels and evaluate the performance overhead incurred over traditional non-persistent implementations. In particular, we find that persistence instructions occur in clusters along with expensive fence operations, they have long latency, and they add a significant execution time overhead, on average by 20.3% over code with logging but without fence instructions to order persists. To deal with this overhead and alleviate the performance bottleneck, we propose to speculate past long latency persistency operations using checkpoint-based processing. Our speculative persistence architecture reduces the execution time overheads to only 3.6%.

Azab, M., Fortes, J. A. B..  2017.  Towards Proactive SDN-Controller Attack and Failure Resilience. 2017 International Conference on Computing, Networking and Communications (ICNC). :442–448.

SDN networks rely mainly on a set of software defined modules, running on generic hardware platforms, and managed by a central SDN controller. The tight coupling and lack of isolation between the controller and the underlying host limit the controller resilience against host-based attacks and failures. That controller is a single point of failure and a target for attackers. ``Linux-containers'' is a successful thin virtualization technique that enables encapsulated, host-isolated execution-environments for running applications. In this paper we present PAFR, a controller sandboxing mechanism based on Linux-containers. PAFR enables controller/host isolation, plug-and-play operation, failure-and-attack-resilient execution, and fast recovery. PAFR employs and manages live remote checkpointing and migration between different hosts to evade failures and attacks. Experiments and simulations show that the frequent employment of PAFR's live-migration minimizes the chance of successful attack/failure with limited to no impact on network performance.

2017-06-05
Schordan, Markus, Oppelstrup, Tomas, Jefferson, David, Barnes, Jr., Peter D., Quinlan, Dan.  2016.  Automatic Generation of Reversible C++ Code and Its Performance in a Scalable Kinetic Monte-Carlo Application. Proceedings of the 2016 Annual ACM Conference on SIGSIM Principles of Advanced Discrete Simulation. :111–122.

The fully automatic generation of code that establishes the reversibility of arbitrary C/C++ code has been a target of research and engineering for more than a decade as reverse computation has become a central notion in large scale parallel discrete event simulation (PDES). The simulation models that are implemented for PDES are of increasing complexity and size and require various language features to support abstraction, encapsulation, and composition when building a simulation model. In this paper we focus on parallel simulation models that are written in C++ and present an approach and an evaluation for a fully automatically generated reversible code for a kinetic Monte-Carlo application implemented in C++. Although a significant runtime overhead is introduced with our technique, the assurance that the reverse code is generated automatically and correctly, is an enormous win that allows simulation model developers to write forward event code using the entire C++ language, and have that code automatically transformed into reversible code to enable parallel execution with the Rensselaer's Optimistic Simulation System (ROSS).

2017-05-16
Ren, Kun, Diamond, Thaddeus, Abadi, Daniel J., Thomson, Alexander.  2016.  Low-Overhead Asynchronous Checkpointing in Main-Memory Database Systems. Proceedings of the 2016 International Conference on Management of Data. :1539–1551.

As it becomes increasingly common for transaction processing systems to operate on datasets that fit within the main memory of a single machine or a cluster of commodity machines, traditional mechanisms for guaranteeing transaction durability–-which typically involve synchronous log flushes–-incur increasingly unappealing costs to otherwise lightweight transactions. Many applications have turned to periodically checkpointing full database state. However, existing checkpointing methods–-even those which avoid freezing the storage layer–-often come with significant costs to operation throughput, end-to-end latency, and total memory usage. This paper presents Checkpointing Asynchronously using Logical Consistency (CALC), a lightweight, asynchronous technique for capturing database snapshots that does not require a physical point of consistency to create a checkpoint, and avoids conspicuous latency spikes incurred by other database snapshotting schemes. Our experiments show that CALC can capture frequent checkpoints across a variety of transactional workloads with extremely small cost to transactional throughput and low additional memory usage compared to other state-of-the-art checkpointing systems.

Mendizabal, Odorico M., Dotti, Fernando Luís, Pedone, Fernando.  2016.  Analysis of Checkpointing Overhead in Parallel State Machine Replication. Proceedings of the 31st Annual ACM Symposium on Applied Computing. :534–537.

State machine replication (SMR) is a well-established technique to fault-tolerant systems. In part, this is explained by the simplicity of the approach and its strong consistency guarantees. Recently, several proposals have suggested parallelizing the execution of state machine replicas to achieve high throughput. Concurrent execution of commands has many implications, including the recovery of replicas from failures. Conventional checkpointing techniques, for example, must be revisited in parallelized models. In this paper, we review parallel variations of state machine replication and discuss how checkpointing procedures apply to these models. Moreover, we evaluate the impact caused by checkpointing techniques on recovery through simulations.