Biblio
A normal computer infected with malware is difficult to detect. There have been several approaches in the last years which analyze the behavior of malware and obtain good results. The malware traffic may be detected, but it is very common to miss-detect normal traffic as malicious and generate false positives. This is specially the case when the methods are tested in real and large networks. The detection errors are generated due to the malware changing and rapidly adapting its domains and patterns to mimic normal connections. To better detect malware infections and separate them from normal traffic we propose to detect the behavior of the group of connections generated by the malware. It is known that malware usually generates various related connections simultaneously and therefore it shows a group pattern. Based on previous experiments, this paper suggests that the behavior of a group of connections can be modelled as a directed cyclic graph with special properties, such as its internal patterns, relationships, frequencies and sequences of connections. By training the group models on known traffic it may be possible to better distinguish between a malware connection and a normal connection.
Cyber-physical systems (CPS) are often network integrated to enable remote management, monitoring, and reporting. Such integration has made them vulnerable to cyber attacks originating from an untrusted network (e.g., the internet). Once an attacker breaches the network security, he could corrupt operations of the system in question, which may in turn lead to catastrophes. Hence there is a critical need to detect intrusions into mission-critical CPS. Signature based detection may not work well for CPS, whose complexity may preclude any succinct signatures that we will need. Specification based detection requires accurate definitions of system behaviour that similarly can be hard to obtain, due to the CPS's complexity and dynamics, as well as inaccuracies and incompleteness of design documents or operation manuals. Formal models, to be tractable, are often oversimplified, in which case they will not support effective detection. In this paper, we study a behaviour-based machine learning (ML) approach for the intrusion detection. Whereas prior unsupervised ML methods have suffered from high missed detection or false-positive rates, we use a high-fidelity CPS testbed, which replicates all main physical and control components of a modern water treatment facility, to generate systematic training data for a supervised method. The method does not only detect the occurrence of a cyber attack at the physical process layer, but it also identifies the specific type of the attack. Its detection is fast and robust to noise. Furthermore, its adaptive system model can learn quickly to match dynamics of the CPS and its operating environment. It exhibits a low false positive (FP) rate, yet high precision and recall.
This paper describes an approach for detecting the presence of domain name system (DNS) tunnels in network traffic. DNS tunneling is a common technique hackers use to establish command and control nodes and to exfiltrate data from networks. To generate the training data sufficient to build models to detect DNS tunneling activity, a penetration testing effort was employed. We extracted features from this data and trained random forest classifiers to distinguish normal DNS activity from tunneling activity. The classifiers successfully detected the presence of tunnels we trained on, and four other types of tunnels that were not a part of the training set.
Intrusion detection using multiple security devices has received much attention recently. The large volume of information generated by these tools, however, increases the burden on both computing resources and security administrators. Moreover, attack detection does not improve as expected if these tools work without any coordination. In this work, we propose a simple method to join information generated by security monitors with diverse data formats. We present a novel intrusion detection technique that uses unsupervised clustering algorithms to identify malicious behavior within large volumes of diverse security monitor data. First, we extract a set of features from network-level and host-level security logs that aid in detecting malicious host behavior and flooding-based network attacks in an enterprise network system. We then apply clustering algorithms to the separate and joined logs and use statistical tools to identify anomalous usage behaviors captured by the logs. We evaluate our approach on an enterprise network data set, which contains network and host activity logs. Our approach correctly identifies and prioritizes anomalous behaviors in the logs by their likelihood of maliciousness. By combining network and host logs, we are able to detect malicious behavior that cannot be detected by either log alone.
Obfuscation is a mechanism used to hinder reverse engineering of programs. To cope with the large number of obfuscated programs, especially malware, reverse engineers automate the process of deobfuscation i.e. extracting information from obfuscated programs. Deobfuscation techniques target specific obfuscation transformations, which requires reverse engineers to manually identify the transformations used by a program, in what is known as metadata recovery attack. In this paper, we present Oedipus, a Python framework that uses machine learning classifiers viz., decision trees and naive Bayes, to automate metadata recovery attacks against obfuscated programs. We evaluated Oedipus' performance using two datasets totaling 1960 unobfuscated C programs, which were used to generate 11.075 programs obfuscated using 30 configurations of 6 different obfuscation transformations. Our results empirically show the feasibility of using machine learning to implement the metadata recovery attacks with classification accuracies of 100% in some cases.
Large scale datacenters are becoming the compute and data platform of large enterprises, but their scale makes them difficult to secure applications running within. We motivate this setting using a real world complex scenario, and propose a data-driven approach to taming this complexity. We discuss several machine learning problems that arise, in particular focusing on inducing so-called whitelist communication policies, from observing masses of communications among networked computing nodes. Briefly, a whitelist policy specifies which machine, or groups of machines, can talk to which. We present some of the challenges and opportunities, such as noisy and incomplete data, non-stationarity, lack of supervision, challenges of evaluation, and describe some of the approaches we have found promising.
As the frequency, severity, and sophistication of cyber attacks increase, along with our dependence on reliable computing infrastructure, the role of Intrusion Detection Systems (IDS) gaining importance. One of the challenges in deploying an IDS stems from selecting a combination of detectors that are relevant and accurate for the environment where security is being considered. In this work, we propose a new measurement approach to address two key obstacles: the base-rate fallacy, and the unit of analysis problem. Our key contribution is to utilize the notion of a `signal', an indicator of an event that is observable to an IDS, as the measurement target, and apply the multiple instance paradigm (from machine learning) to enable cross-comparable measures regardless of the unit of analysis. To support our approach, we present a detailed case study and provide empirical examples of the effectiveness of both the model and measure by demonstrating the automated construction, optimization, and correlation of signals from different domains of observation (e.g. network based, host based, application based) and using different IDS techniques (signature based, anomaly based).
The security and typical attack behavior of Modbus/TCP industrial network communication protocol are analyzed. The data feature of traffic flow is extracted through the operation mode of the depth analysis abnormal behavior, and the intrusion detection method based on the support vector machine (SVM) is designed. The method analyzes the data characteristics of abnormal communication behavior, and constructs the feature input structure and detection system based on SVM algorithm by using the direct behavior feature selection and abnormal behavior pattern feature construction. The experimental results show that the method can effectively improve the detection rate of abnormal behavior, and enhance the safety protection function of industrial network.
A honeypot is a deception tool for enticing attackers to make efforts to compromise the electronic information systems of an organization. A honeypot can serve as an advanced security surveillance tool for use in minimizing the risks of attacks on information technology systems and networks. Honeypots are useful for providing valuable insights into potential system security loopholes. The current research investigated the effectiveness of the use of centralized system management technologies called Puppet and Virtual Machines in the implementation automated honeypots for intrusion detection, correction and prevention. A centralized logging system was used to collect information of the source address, country and timestamp of intrusions by attackers. The unique contributions of this research include: a demonstration how open source technologies is used to dynamically add or modify hacking incidences in a high-interaction honeynet system; a presentation of strategies for making honeypots more attractive for hackers to spend more time to provide hacking evidences; and an exhibition of algorithms for system and network intrusion prevention.
Consensus algorithms provide strategies to solve problems in a distributed system with the added constraint that data can only be shared between adjacent computing nodes. We find these algorithms in applications for wireless and sensor networks, spectrum sensing for cognitive radio, even for some IoT services. However, consensus-based applications are not resilient to compromised nodes sending falsified data to their neighbors, i.e. they can be the target of Byzantine attacks. Several solutions have been proposed in the literature inspired from reputation based systems, outlier detection or model-based fault detection techniques in process control. We have reviewed some of these solutions, and propose two mitigation techniques to protect the consensus-based Network Intrusion Detection System in [1]. We analyze several implementation issues such as computational overhead, fine tuning of the solution parameters, impacts on the convergence of the consensus phase, accuracy of the intrusion detection system.
This work presents a systematic analysis of symmetric encryption modes for SSH that are in use on the Internet, providing deployment statistics, new attacks, and security proofs for widely used modes. We report deployment statistics based on two Internet-wide scans of SSH servers conducted in late 2015 and early 2016. Dropbear and OpenSSH implementations dominate in our scans. From our first scan, we found 130,980 OpenSSH servers that are still vulnerable to the CBC-mode-specific attack of Albrecht et al. (IEEE S&P 2009), while we found a further 20,000 OpenSSH servers that are vulnerable to a new attack on CBC-mode that bypasses the counter-measures introduced in OpenSSH 5.2 to defeat the attack of Albrecht et al. At the same time, 886,449 Dropbear servers in our first scan are vulnerable to a variant of the original CBC-mode attack. On the positive side, we provide formal security analyses for other popular SSH encryption modes, namely ChaCha20-Poly1305, generic Encrypt-then-MAC, and AES-GCM. Our proofs hold for detailed pseudo-code descriptions of these algorithms as implemented in OpenSSH. Our proofs use a corrected and extended version of the "fragmented decryption" security model that was specifically developed for the SSH setting by Boldyreva et al. (Eurocrypt 2012). These proofs provide strong confidentiality and integrity guarantees for these alternatives to CBC-mode encryption in SSH. However, we also show that these alternatives do not meet additional, desirable notions of security (boundary-hiding under passive and active attacks, and denial-of-service resistance) that were formalised by Boldyreva et al.
Cloud storage has been gaining in popularity as an on-line service for archiving, backup, and even primary storage of files. However, due to the data outsourcing, cloud storage also introduces new security challenges, which require a data audit and data repair service to ensure data availability and data integrity in the cloud. In this paper, we present the design and implementation of a network-coding-based Proof Of Retrievability scheme called ELAR, which achieves a lightweight data auditing and data repairing. In particular, we support direct repair mechanism in which the client can be free from the data repair process. Simultaneously, we also support the task of allowing a third party auditor (TPA), on behalf of the client, to verify the availability and integrity of the data stored in the cloud servers without the need of an asymmetric-key setting. The client is thus also free from the data audit process. TPA uses spot-checking which is a very efficient probabilistic method for checking a large amount of data. Extensive security and performance analysis show that the proposed scheme is highly efficient and provably secure.
Firewall policies are notorious for having misconfiguration errors which can defeat its intended purpose of protecting hosts in the network from malicious users. We believe this is because today's firewall policies are mostly monolithic. Inspired by ideas from modular programming and code refactoring, in this work we introduce three kinds of modules: primary, auxiliary, and template, which facilitate the refactoring of a firewall policy into smaller, reusable, comprehensible, and more manageable components. We present algorithms for generating each of the three modules for a given legacy firewall policy. We also develop ModFP, an automated tool for converting legacy firewall policies represented in access control list to their modularized format. With the help of ModFP, when examining several real-world policies with sizes ranging from dozens to hundreds of rules, we were able to identify subtle errors.
Cloud service providers offer storage outsourcing facility to their clients. In a secure cloud storage (SCS) protocol, the integrity of the client's data is maintained. In this work, we construct a publicly verifiable secure cloud storage protocol based on a secure network coding (SNC) protocol where the client can update the outsourced data as needed. To the best of our knowledge, our scheme is the first SNC-based SCS protocol for dynamic data that is secure in the standard model and provides privacy-preserving audits in a publicly verifiable setting. Furthermore, we discuss, in details, about the (im)possibility of providing a general construction of an efficient SCS protocol for dynamic data (DSCS protocol) from an arbitrary SNC protocol. In addition, we modify an existing DSCS scheme (DPDP I) in order to support privacy-preserving audits. We also compare our DSCS protocol with other SCS schemes (including the modified DPDP I scheme). Finally, we figure out some limitations of an SCS scheme constructed using an SNC protocol.
In this paper we propose a protocol that allows end-users in a decentralized setup (without requiring any trusted third party) to protect data shipped to remote servers using two factors - knowledge (passwords) and possession (a time based one time password generation for authentication) that is portable. The protocol also supports revocation and recreation of a new possession factor if the older possession factor is compromised, provided the legitimate owner still has a copy of the possession factor. Furthermore, akin to some other recent works, our approach naturally protects the outsourced data from the storage servers themselves, by application of encryption and dispersal of information across multiple servers. We also extend the basic protocol to demonstrate how collaboration can be supported even while the stored content is encrypted, and where each collaborator is still restrained from accessing the data through a multi-factor access mechanism. Such techniques achieving layered security is crucial to (opportunistically) harness storage resources from untrusted entities.
To consider construction of strongly secure network coding scheme without universality, this paper focuses on properties of MDS(maximum distance separable) codes, especially, Reed-Solomon codes. Our scheme applies Reed-Solomon codes in coset coding scheme to achieve the security based on the classical underlying network coding. Comparing with the existing scheme, MRD(maximum rank distance) code and a necessary condition based on MRD are not required in the scheme. Furthermore, considering the conditions between the code for security and the underlying network code, the scheme could be applied for more situations on fields.
Two recent proposals by Bernstein and Pornin emphasize the use of deterministic signatures in DSA and its elliptic curve-based variants. Deterministic signatures derive the required ephemeral key value in a deterministic manner from the message to be signed and the secret key instead of using random number generators. The goal is to prevent severe security issues, such as the straight-forward secret key recovery from low quality random numbers. Recent developments have raised skepticism whether e.g. embedded or pervasive devices are able to generate randomness of sufficient quality. The main concerns stem from individual implementations lacking sufficient entropy source and standardized methods for random number generation with suspected back doors. While we support the goal of deterministic signatures, we are concerned about the fact that this has a significant influence on side-channel security of implementations. Specifically, attackers will be able to mount differential side-channel attacks on the additional use of the secret key in a cryptographic hash function to derive the deterministic ephemeral key. Previously, only a simple integer arithmetic function to generate the second signature parameter had to be protected, which is rather straight-forward. Hash functions are significantly more difficult to protect. In this contribution, we systematically explain how deterministic signatures introduce this new side-channel vulnerability.
The pervasive presence of interconnected objects enables new communication paradigms where devices can easily reach each other while interacting within their environment. The so-called Internet of Things (IoT) represents the integration of several computing and communications systems aiming at facilitating the interaction between these devices. Arduino is one of the most popular platforms used to prototype new IoT devices due to its open, flexible and easy-to-use architecture. Ardunio Yun is a dual board microcontroller that supports a Linux distribution and it is currently one of the most versatile and powerful Arduino systems. This feature positions Arduino Yun as a popular platform for developers, but it also introduces unique infection vectors from the security viewpoint. In this work, we present a security analysis of Arduino Yun. We show that Arduino Yun is vulnerable to a number of attacks and we implement a proof of concept capable of exploiting some of them.
Smart environments and security systems require automatic detection of human behaviors including approaching to or departing from an object. Existing human motion detection systems usually require human beings to carry special devices, which limits their applications. In this paper, we present a system called APID to detect arm reaching by analyzing backscatter communication signals from a passive RFID tag on the object. APID does not require human beings to carry any device. The idea is based on the influence of human movements to the vibration of backscattered tag signals. APID is compatible with commodity off-the-shelf devices and the EPCglobal Class-1 Generation-2 protocol. In APID an commercial RFID reader continuously queries tags through emitting RF signals and tags simply respond with their IDs. A USRP monitor passively analyzes the communication signals and reports the approach and departure behaviors. We have implemented the APID system for both single-object and multi-object scenarios in both horizontal and vertical deployment modes. The experimental results show that APID can achieve high detection accuracy.
Sego is a hypervisor-based system that gives strong privacy and integrity guarantees to trusted applications, even when the guest operating system is compromised or hostile. Sego verifies operating system services, like the file system, instead of replacing them. By associating trusted metadata with user data across all system devices, Sego verifies system services more efficiently than previous systems, especially services that depend on data contents. We extensively evaluate Sego's performance on real workloads and implement a kernel fault injector to validate Sego's file system-agnostic crash consistency and recovery protocol.
Mobile devices offer access to our digital lives and thus need to be protected against the risk of unauthorized physical access by applying strong authentication, which in turn adversely affects usability. The actual risk, however, depends on dynamic factors like day and time. In this paper we discuss the idea of using location-based risk assessment in combination with multi-modal biometrics to adjust the level of authentication necessary to the situational risk of unauthorized access.
We develop and evaluate a data hiding method that enables smartphones to encrypt and embed sensitive information into carrier streams of sensor data. Our evaluation considers multiple handsets and a variety of data types, and we demonstrate that our method has a computational cost that allows real-time data hiding on smartphones with negligible distortion of the carrier stream. These characteristics make it suitable for smartphone applications involving privacy-sensitive data such as medical monitoring systems and digital forensics tools.
The ubiquity of portable mobile devices equipped with built-in cameras have led to a transformation in how and when digital images are captured, shared, and archived. Photographs and videos from social gatherings, public events, and even crime scenes are commonplace online. While the spontaneity afforded by these devices have led to new personal and creative outlets, privacy concerns of bystanders (and indeed, in some cases, unwilling subjects) have remained largely unaddressed. We present I-Pic, a trusted software platform that integrates digital capture with user-defined privacy. In I-Pic, users choose alevel of privacy (e.g., image capture allowed or not) based upon social context (e.g., out in public vs. with friends vs. at workplace). Privacy choices of nearby users are advertised via short-range radio, and I-Pic-compliant capture platforms generate edited media to conform to privacy choices of image subjects. I-Pic uses secure multiparty computation to ensure that users' visual features and privacy choices are not revealed publicly, regardless of whether they are the subjects of an image capture. Just as importantly, I-Pic preserves the ease-of-use and spontaneous nature of capture and sharing between trusted users. Our evaluation of I-Pic shows that a practical, energy-efficient system that conforms to the privacy choices of many users within a scene can be built and deployed using current hardware.
This last decade has witnessed a wide adoption of connected mobile devices able to capture the context of their owners from embedded sensors (GPS, Wi-Fi, Bluetooth, accelerometers). The advent of mobile and pervasive computing has enabled rich social and contextual applications, but the use of such technologies raises severe privacy issues and challenges. The privacy threats come from diverse adversaries, ranging from curious service providers and other users of the same service to eavesdroppers and curious applications running on the device. The information that can be collected from mobile device owners includes their locations, their social relationships, and their current activity. All of this, once analyzed and combined together through inference, can be very telling about the users' private lives. In this talk, we will describe privacy threats in mobile and pervasive networks. We will also show how to quantify the privacy of the users of such networks and explain how information on co-location can be taken into account. We will describe the role that privacy enhancing technologies (PETs) can play and describe some of them. We will also explain how to prevent apps from sifting too many personal data under Android. We will conclude by mentioning the privacy and security challenges raised by the quantified self and digital medicine
In this paper, we propose a new risk analysis framework that enables to supervise risks in complex and distributed systems. Our contribution is twofold. First, we provide the Risk Assessment Graphs (RAGs) as a model of risk analysis. This graph-based model is adaptable to the system changes over the time. We also introduce the potentiality and the accessibility functions which, during each time slot, evaluate respectively the chance of exploiting the RAG's nodes, and the connection time between these nodes. In addition, we provide a worst-case risk evaluation approach, based on the assumption that the intruder threats usually aim at maximising their benefits by inflicting the maximum damage to the target system (i.e. choosing the most likely paths in the RAG). We then introduce three security metrics: the propagated risk, the node risk and the global risk. We illustrate the use of our framework through the simple example of an enterprise email service. Our framework achieves both flexibility and generality requirements, it can be used to assess the external threats as well as the insider ones, and it applies to a wide set of applications.