Biblio
Modern world has witnessed a dramatic increase in our ability to collect, transmit and distribute real-time monitoring and surveillance data from large-scale information systems and cyber-physical systems. Detecting system anomalies thus attracts significant amount of interest in many fields such as security, fault management, and industrial optimization. Recently, invariant network has shown to be a powerful way in characterizing complex system behaviours. In the invariant network, a node represents a system component and an edge indicates a stable, significant interaction between two components. Structures and evolutions of the invariance network, in particular the vanishing correlations, can shed important light on locating causal anomalies and performing diagnosis. However, existing approaches to detect causal anomalies with the invariant network often use the percentage of vanishing correlations to rank possible casual components, which have several limitations: 1) fault propagation in the network is ignored; 2) the root casual anomalies may not always be the nodes with a high-percentage of vanishing correlations; 3) temporal patterns of vanishing correlations are not exploited for robust detection. To address these limitations, in this paper we propose a network diffusion based framework to identify significant causal anomalies and rank them. Our approach can effectively model fault propagation over the entire invariant network, and can perform joint inference on both the structural, and the time-evolving broken invariance patterns. As a result, it can locate high-confidence anomalies that are truly responsible for the vanishing correlations, and can compensate for unstructured measurement noise in the system. Extensive experiments on synthetic datasets, bank information system datasets, and coal plant cyber-physical system datasets demonstrate the effectiveness of our approach.
Workflow-centric tracing captures the workflow of causally-related events (e.g., work done to process a request) within and among the components of a distributed system. As distributed systems grow in scale and complexity, such tracing is becoming a critical tool for understanding distributed system behavior. Yet, there is a fundamental lack of clarity about how such infrastructures should be designed to provide maximum benefit for important management tasks, such as resource accounting and diagnosis. Without research into this important issue, there is a danger that workflow-centric tracing will not reach its full potential. To help, this paper distills the design space of workflow-centric tracing and describes key design choices that can help or hinder a tracing infrastructures utility for important tasks. Our design space and the design choices we suggest are based on our experiences developing several previous workflow-centric tracing infrastructures.
We propose an exact algorithm to model-free diagnosis with an application to fault localization in digital circuits. We assume that a faulty circuit and a correctness specification, e.g., in terms of an un-optimized reference circuit, are available. Our algorithm computes the exact set of all minimal diagnoses up to cardinality k considering all possible assignments to the primary inputs of the circuit. This exact diagnosis problem can be naturally formulated and solved using an oracle for Quantified Boolean Satisfiability (QSAT). Our algorithm uses Boolean Satisfiability (SAT) instead to compute the exact result more efficiently. We implemented the approach and present experimental results for determining fault candidates of digital circuits with seeded faults on the gate level. The experiments show that the presented SAT-based approach outperforms state-of-the-art techniques from solving instances of the QSAT problem by several orders of magnitude while having the same accuracy. Moreover, in contrast to QSAT, the SAT-based algorithm has any-time behavior, i.e., at any-time of the computation, an approximation of the exact result is available that can be used as a starting point for debugging. The result improves while time progresses until eventually the exact result is obtained.
Energy use of buildings represents roughly 40% of the overall energy consumption. Most of the national agendas contain goals related to reducing the energy consumption and carbon footprint. Timely and accurate fault detection and diagnosis (FDD) in building management systems (BMS) have the potential to reduce energy consumption cost by approximately 15-30%. Most of the FDD methods are data-based, meaning that their performance is tightly linked to the quality and availability of relevant data. Based on our experience, faults and relevant events data is very sparse and inadequate, mostly because of the lack of will and incentive for those that would need to keep track of faults. In this paper we introduce the idea of using crowdsourcing to support FDD data collection processes, and illustrate our idea through a mobile application that has been implemented for this purpose. Furthermore, we propose a strategy of how to successfully deploy this building occupants' crowdsourcing application.
In order to realize the accurate positioning and recognition effectively of the analog circuit, the feature extraction of fault information is an extremely important port. This arrival based on the experimental circuit which is designed as a failure mode to pick-up the fault sample set. We have chosen two methods, one is the combination of wavelet transform and principal component analysis, the other is the factorial analysis for the fault data's feature extraction, and we also use the extreme learning machine to train and diagnose the data, to compare the performance of these two methods through the accuracy of the diagnosis. The results of the experiment shows that the data which we get from the experimental circuit, after dealing with these two methods can quickly get the fault location.
This paper considers the problem of system-level fault diagnosis in highly dynamic networks. The existing fault diagnostic models deal mainly with static faults and have limited capabilities to handle dynamic networks. These fault diagnostic models are based on timers that work on a simple timeout mechanism to identify the node status, and often make simplistic assumptions for system implementations. To overcome the above problems, we propose a time-free comparison-based diagnostic model. Unlike the traditional models, the proposed model does not rely on timers and is more suitable for use in dynamic network environments. We also develop a novel comparison-based fault diagnosis protocol for identifying and diagnosing dynamic faults. The performance of the protocol has been analyzed and its correctness has been proved.
Fault diagnosis in IT environments is complicated because (i) most monitors have shared specificity (high amount of memory utilization can result from a large number of causes), (ii) it is hard to deploy and maintain enough sensors to ensure adequate coverage, and (iii) some functionality may be provided as-a-service by external parties with limited visibility and simultaneous availability of alert data. To systematically incorporate uncertainty and to be able to fuse information from multiple sources, we propose the use of Dempster-Shafer Theory (DST) of evidential reasoning for fault diagnosis and show its efficacy in the context of a distributed application.
Self-describing key-value data formats such as JSON are becoming increasingly popular as application developers choose to avoid the rigidity imposed by the relational model. Database systems designed for these self-describing formats, such as MongoDB, encourage users to use denormalized, heavily nested data models so that relationships across records and other schema information need not be predefined or standardized. Such data models contribute to long-term development complexity, as their lack of explicit entity and relationship tracking burdens new developers unfamiliar with the dataset. Furthermore, the large amount of data repetition present in such data layouts can introduce update anomalies and poor scan performance, which reduce both the quality and performance of analytics over the data. In this paper we present an algorithm that automatically transforms the denormalized, nested data commonly found in NoSQL systems into traditional relational data that can be stored in a standard RDBMS. This process includes a schema generation algorithm that discovers relationships across the attributes of the denormalized datasets in order to organize those attributes into relational tables. It further includes a matching algorithm that discovers sets of attributes that represent overlapping entities and merges those sets together. These algorithms reduce data repetition, allow the use of data analysis tools targeted at relational data, accelerate scan-intensive algorithms over the data, and help users gain a semantic understanding of complex, nested datasets.
With data becoming available in larger quantities and at higher rates, new data processing paradigms have been proposed to handle high-volume, fast-moving data. Data Stream Processing is one such paradigm wherein transient data streams flow through sets of continuous queries, only returning results when data is of interest to the querier. To avoid the large costs associated with maintaining the infrastructure required for processing these data streams, many companies will outsource their computation to third-party cloud services. This outsourcing, however, can lead to private data being accessed by parties that a data provider may not trust. The literature offers solutions to this confidentiality and access control problem but they have fallen short of providing a complete solution to these problems, due to either immense overheads or trust requirements placed on these third-party services. To address these issues, we have developed PolyStream, an enhancement to existing data stream management systems that enables data providers to specify attribute-based access control policies that are cryptographically enforced while simultaneously allowing many types of in-network data processing. We detail the access control models and mechanisms used by PolyStream, and describe a novel use of security punctuations that enables flexible, online policy management and key distribution. We detail how queries are submitted and executed using an unmodified Data Stream Management System, and show through an extensive evaluation that PolyStream yields a 550x performance gain versus the state-of-the-art system StreamForce in CODASPY 2014, while providing greater functionality to the querier.
The World Wide Web has become the most common platform for building applications and delivering content. Yet despite years of research, the web continues to face severe security challenges related to data integrity and confidentiality. Rather than continuing the exploit-and-patch cycle, we propose addressing these challenges at an architectural level, by supplementing the web's existing connection-based and server-based security models with a new approach: content-based security. With this approach, content is directly signed and encrypted at rest, enabling it to be delivered via any path and then validated by the browser. We explore how this new architectural approach can be applied to the web and analyze its security benefits. We then discuss a broad research agenda to realize this vision and the challenges that must be overcome.
This paper introduces a design and implementation of a security scheme for the Internet of Things (IoT) based on ECQV Implicit Certificates and Datagram Transport Layer Security (DTLS) protocol. In this proposed security scheme, Elliptic curve cryptography based ECQV implicit certificate plays a key role allowing mutual authentication and key establishment between two resource-constrained IoT devices. We present how IoT devices get ECQV implicit certificates and use them for authenticated key exchange in DTLS. An evaluation of execution time of the implementation is also conducted to assess the efficiency of the solution.
We study a sensor network setting in which samples are encrypted individually using different keys and maintained on a cloud storage. For large systems, e.g. those that generate several millions of samples per day, fine-grained sharing of encrypted samples is challenging. Existing solutions, such as Attribute-Based Encryption (ABE) and Key Aggregation Cryptosystem (KAC), can be utilized to address the challenge, but only to a certain extent. They are often computationally expensive and thus unlikely to operate at scale. We propose an algorithmic enhancement and two heuristics to improve KAC's key reconstruction cost, while preserving its provable security. The improvement is particularly significant for range and down-sampling queries – accelerating the reconstruction cost from quadratic to linear running time. Experimental study shows that for queries of size 32k samples, the proposed fast reconstruction techniques speed-up the original KAC by at least 90 times on range and down-sampling queries, and by eight times on general (arbitrary) queries. It also shows that at the expense of splitting the query into 16 sub-queries and correspondingly issuing that number of different aggregated keys, reconstruction time can be reduced by 19 times. As such, the proposed techniques make KAC more applicable in practical scenarios such as sensor networks or the Internet of Things.
The semantics of online authentication in the web are rather straightforward: if Alice has a certificate binding Bob's name to a public key, and if a remote entity can prove knowledge of Bob's private key, then (barring key compromise) that remote entity must be Bob. However, in reality, many websites' and the majority of the most popular ones-are hosted at least in part by third parties such as Content Delivery Networks (CDNs) or web hosting providers. Put simply: administrators of websites who deal with (extremely) sensitive user data are giving their private keys to third parties. Importantly, this sharing of keys is undetectable by most users, and widely unknown even among researchers. In this paper, we perform a large-scale measurement study of key sharing in today's web. We analyze the prevalence with which websites trust third-party hosting providers with their secret keys, as well as the impact that this trust has on responsible key management practices, such as revocation. Our results reveal that key sharing is extremely common, with a small handful of hosting providers having keys from the majority of the most popular websites. We also find that hosting providers often manage their customers' keys, and that they tend to react more slowly yet more thoroughly to compromised or potentially compromised keys.
Security threats may hinder the large scale adoption of the emerging Internet of Things (IoT) technologies. Besides efforts have already been made in the direction of data integrity preservation, confidentiality and privacy, several issues are still open. The existing solutions are mainly based on encryption techniques, but no attention is actually paid to key management. A clever key distribution system, along with a key replacement mechanism, are essentials for assuring a secure approach. In this paper, two popular key management systems, conceived for wireless sensor networks, are integrated in a real IoT middleware and compared in order to evaluate their performance in terms of overhead, delay and robustness towards malicious attacks.
The storage and management of secrets (encryption keys, passwords, etc) are significant open problems in the age of ephemeral, cloud-based computing infrastructure. How do we store and control access to the secrets necessary to configure and operate a range of modern technologies without sacrificing security and privacy requirements or significantly curtailing the desirable capabilities of our systems? To answer this question, we propose Tutamen: a next-generation secret-storage service. Tutamen offers a number of desirable properties not present in existing secret-storage solutions. These include the ability to operate across administrative domain boundaries and atop minimally trusted infrastructure. Tutamen also supports access control based on contextual, multi-factor, and alternate-band authentication parameters. These properties have allowed us to leverage Tutamen to support a variety of use cases not easily realizable using existing systems, including supporting full-disk encryption on headless servers and providing fully-featured client-side encryption for cloud-based file-storage services. In this paper, we present an overview of the secret-storage challenge, Tutamen's design and architecture, the implementation of our Tutamen prototype, and several of the applications we have built atop Tutamen. We conclude that Tutamen effectively eases the secret-storage burden and allows developers and systems administrators to achieve previously unattainable security-oriented goals while still supporting a wide range of feature-oriented requirements.
Explicit non-linear transformations of existing steganalysis features are shown to boost their ability to detect steganography in combination with existing simple classifiers, such as the FLD-ensemble. The non-linear transformations are learned from a small number of cover features using Nyström approximation on pilot vectors obtained with kernelized PCA. The best performance is achieved with the exponential form of the Hellinger kernel, which improves the detection accuracy by up to 2-3% for spatial-domain contentadaptive steganography. Since the non-linear map depends only on the cover source and its learning has a low computational complexity, the proposed approach is a practical and low cost method for boosting the accuracy of existing detectors built as binary classifiers. The map can also be used to significantly reduce the feature dimensionality (by up to factor of ten) without performance loss with respect to the non-transformed features.
The safety of information inside of cloud networks is of interest to the network administrators. In a new insider attack, inside attackers merge confidential information with videos using digital video steganography. The video can be uploaded to video websites, where the information can be distributed online, where it can cost firms millions in damages. Standard behavior based exfiltration detection does not always prevent these attacks. This form of steganography is almost invisible. Existing compressed video steganalysis only detects small-payload watermarks. We develop such a strategy using distributed algorithms and classify videos, then compare existing algorithms to new ones. We find our approach improves on behavior based exfiltration detection, and on the existing online video steganalysis.
Chang-Chen-Wang's (3,n) Secret grayscale image Sharing between n grayscale cover images method with participant Authentication and damaged pixels Repairing (SSAR) properties is analyzed; it restores the secret image from any three of the cover images used. We show that SSAR may fail, is not able fake participant recognizing, and has limited by 62.5% repairing ability. We propose SSAR (4,n) enhancement, SSAR-E, allowing 100% exact restoration of a corrupted pixel using any four of n covers, and recognizing a fake participant with the help of cryptographic hash functions with 5-bit values that allows better (vs. 4 bits) error detection. Using a special permutation with only one loop including all the secret image pixels, SSAR-E is able restoring all the secret image damaged pixels having just one correct pixel left. SSAR-E allows restoring the secret image to authorized parties only contrary to SSAR. The performance and size of cover images for SSAR-E are the same as for SSAR.
Steganography enables user to hide confidential data in any digital medium such that its existence cannot be concealed by the third party. Several research work is being is conducted to improve steganography algorithm's efficiency. Recent trends in computing technology use steganography as an important tool for hiding confidential data. This paper summarizes some of the research work conducted in the field of image steganography in spatial domain along with their advantages and disadvantages. Future research work and experimental results of some techniques is also being discussed. The key goal is to show the powerful impact of steganography in information hiding and image processing domain.
This article deals with color images steganalysis based on machine learning. The proposed approach enriches the features from the Color Rich Model by adding new features obtained by applying steerable Gaussian filters and then computing the co-occurrence of pixel pairs. Adding these new features to those obtained from Color-Rich Models allows us to increase the detectability of hidden messages in color images. The Gaussian filters are angled in different directions to precisely compute the tangent of the gradient vector. Then, the gradient magnitude and the derivative of this tangent direction are estimated. This refined method of estimation enables us to unearth the minor changes that have occurred in the image when a message is embedded. The efficiency of the proposed framework is demonstrated on three stenographic algorithms designed to hide messages in images: S-UNIWARD, WOW, and Synch-HILL. Each algorithm is tested using different payload sizes. The proposed approach is compared to three color image steganalysis methods based on computation features and Ensemble Classifier classification: the Spatial Color Rich Model, the CFA-aware Rich Model and the RGB Geometric Color Rich Model.
The MP4 files has become to most used video media file available, and will mostly likely remain at the top for some time to come. This makes MP4 files an interesting candidate for steganography. With its size and structure, it offers a challenge to steganography developers. While some attempts have been made to create a truly covert file, few are as successful as Martin Fiedler's TCSteg. TCSteg allows users to hide a TrueCrypt hidden volume in an MP4 file. The structure of the file makes it difficult to identify that a volume exists. In our analysis of TCSteg, we will show how Fielder's code works and how we may be able to detect the existence of steganography. We will then implement these methods in hope that other steganography analysis can use them to determine if an MP4 file is a carrier file. Finally, we will address the future of MP4 steganography.
Steganography is the science of hiding data within data. Either for the good purpose of secret communication or for the bad intention of leaking sensitive confidential data or embedding malicious code or URL. However, many different carrier file formats can be used to hide these data (network, audio, image..etc) but the most common steganography carrier is embedding secret data within images as it is considered to be the best and easiest way to hide all types of files (secret files) within an image using different formats (another image, text, video, virus, URL..etc). To the human eye, the changes in the image appearance with the hidden data can be imperceptible. In fact, images can be more than what we see with our eyes. Therefore, many solutions where proposed to help in detecting these hidden data but each solution have their own strong and weak points either by the limitation of resolving one type of image along with specific hiding technique and or most likely without extracting the hidden data. This paper intends to propose a novel detection approach that will concentrate on detecting any kind of hidden URL in all types of images and extract the hidden URL from the carrier image that used the LSB least significant bit hiding technique.
Technological changes bring great efficiencies and opportunities; however, they also bring new threats and dangers that users are often ill prepared to handle. Some individuals have training at work or school while others have family or friends to help them. However, there are few widely known or ubiquitous educational programs to inform and motivate users to develop safe cybersecurity practices. Additionally, little is known about learning strategies in this domain. Understanding how active Internet users have learned their security practices can give insight into more effective learning methods. I surveyed 800 online labor workers to discover their learning processes. They shared how they had to construct their own schema and negotiate meaning in a complex domain. Findings suggest a need to help users build a dynamic mental model of security. Participants recommend encouraging participatory and constructive learning, multi-model dissemination, and ubiquitous opportunities for learning security behaviors.
Information Systems curricula require on-going and frequent review [2] [11]. Furthermore, such curricula must be flexible because of the fast-paced, dynamic nature of the workplace. Such flexibility can be maintained through modernizing course content or, inclusively, exchanging hardware or software for newer versions. Alternatively, flexibility can arise from incorporating new information into curricula from other disciplines. One field where the pace of change is extremely high is cybersecurity [3]. Students are left with outdated skills when curricula lag behind the pace of change in industry. For example, cryptography is a required learning objective in the DHS/NSA Center of Academic Excellence (CAE) knowledge criteria [1]. However, the overarching curriculum associated with basic ciphers has gone unchanged for decades. Indeed, a general problem in cybersecurity education is that students lack fundamental knowledge in areas such as ciphers [5]. In response, researchers have developed a variety of interactive classroom visualization tools [5] [8] [9]. Such tools visualize the standard approach to frequency analysis of simple substitution ciphers that includes review of most common, single letters in ciphertext. While fundamental ciphers such as the monoalphabetic substitution cipher have not been updated (these are historical ciphers), collective understanding of how humans interact with language has changed. Updated understanding in both English language pedagogy [10] [12] and automated cryptanalysis of substitution ciphers [4] potentially renders the interactive classroom visualization tools incomplete or outdated. Classroom visualization tools are powerful teaching aids, particularly for abstract concepts. Existing research has established that such tools promote an active learning environment that translates to not only effective learning conditions but also higher student retention rates [7]. However, visualization tools require extensive planning and design when used to actively engage students with detailed, specific knowledge units such as ciphers [7] [8]. Accordingly, we propose a heatmap-based frequency analysis visualization solution that (a) incorporates digraph and trigraph language processing norms; (b) and enhances the active learning pedagogy inherent in visualization tools. Preliminary results indicate that study participants take approximately 15% longer to learn the heatmap-based frequency analysis technique compared to traditional frequency analysis but demonstrate a 50% increase in efficacy when tasked with solving simple substitution ciphers. Further, a heatmap-based solution contributes positively to the field insofar as educators have an additional tool to use in the classroom. As well, the heatmap visualization tool may allow researchers to comparatively examine efficacy of visualization tools in the cryptanalysis of mono-alphabetic substitution ciphers.
Recent years have witnessed a flourish of hands-on cybersecurity labs and competitions. The information technology (IT) education community has recognized their significant role in boosting students' interest in security and enhancing their security knowledge and skills. Compared to the focus on individual based education materials, much less attention has been paid to the development of tools and materials suitable for team-based security practices, which, however, prevail in real-world environments. One major bottleneck is lack of suitable platforms for this type of practices in IT education community. In this paper, we propose a low-cost, team-oriented cybersecurity practice platform called Platoon. The Platoon platform allows for quickly and automatically creating one or more virtual networks that mimic real-world corporate networks using a regular computer. The virtual environment created by Platoon is suitable for both cybersecurity labs, competitions, and projects. The performance data and user feedback collected from our cyber-defense exercises indicate that Platoon is practical and useful for enhancing students' security learning outcomes.