Biblio
When running large human computation tasks in the real-world, honeypots play an important role for assessing the overall quality of the work produced. The generation of such honeypots can be a significant burden on the task owner as they require specific characteristics in their design and implementation and continuous maintenance when operating data pipelines that include a human computation component. In this extended abstract we outline a novel approach for creating honeypots using automatically generated questions from a reference knowledge base with the ability to control such parameters as topic and difficulty.
Web application security has become crucially vital these days. Earlier "default allow" model was used to secure web applications but it was unable to secure web applications against plethora of attacks [1]. In contrast, more restricted security to the web applications is provided by default deny model which at first, builds a model for the particular application and then permits merely those requests that conform to that model while ignoring everything else. Besides this, a novel and effective methodology is followed that allows to analyze the validity of application requests and further results in the generation of semi structured XML cases for the web applications. Furthermore, mature and resilient XML cases are generated by employing learning techniques. This system will further be gauged by examining that XML file containing cases are in correct accordance with the XML format or not. Moreover, the distinction between malicious and non-malicious traffic is carried out carefully. Results have proved its efficacy of rule generation employing access traffic log of cross site scripting (XSS), SQL injection, HTTP Request Splitting, HTTP response splitting and Buffer overflow attacks.
Hardware-based enclave protection mechanisms, such as Intelâs SGX, ARMâs TrustZone, and Appleâs Secure Enclave, can protect code and data from powerful low-level attackers. In this work, we use enclaves to enforce strong application-specific information security policies. We present IMPE, a novel calculus that captures the essence of SGX-like enclave mechanisms, and show that a security-type system for IMPE can enforce expressive confidentiality policies (including erasure policies and delimited release policies) against powerful low-level attackers, including attackers that can arbitrarily corrupt non-enclave code, and, under some circumstances, corrupt enclave code. We present a translation from an expressive security-typed calculus (that is not aware of enclaves) to IMPE. The translation automatically places code and data into enclaves to enforce the security policies of the source program.
The fully automatic generation of code that establishes the reversibility of arbitrary C/C++ code has been a target of research and engineering for more than a decade as reverse computation has become a central notion in large scale parallel discrete event simulation (PDES). The simulation models that are implemented for PDES are of increasing complexity and size and require various language features to support abstraction, encapsulation, and composition when building a simulation model. In this paper we focus on parallel simulation models that are written in C++ and present an approach and an evaluation for a fully automatically generated reversible code for a kinetic Monte-Carlo application implemented in C++. Although a significant runtime overhead is introduced with our technique, the assurance that the reverse code is generated automatically and correctly, is an enormous win that allows simulation model developers to write forward event code using the entire C++ language, and have that code automatically transformed into reversible code to enable parallel execution with the Rensselaer's Optimistic Simulation System (ROSS).
Modern software systems are becoming increasingly complex, relying on a lot of third-party library support. Library behaviors are hence an integral part of software behaviors. Analyzing them is as important as analyzing the software itself. However, analyzing libraries is highly challenging due to the lack of source code, implementation in different languages, and complex optimizations. We observe that many Java library functions provide excellent documentation, which concisely describes the functionalities of the functions. We develop a novel technique that can construct models for Java API functions by analyzing the documentation. These models are simpler implementations in Java compared to the original ones and hence easier to analyze. More importantly, they provide the same functionalities as the original functions. Our technique successfully models 326 functions from 14 widely used Java classes. We also use these models in static taint analysis on Android apps and dynamic slicing for Java programs, demonstrating the effectiveness and efficiency of our models.
A self-adaptive system (SAS) can reconfigure to adapt to potentially adverse conditions that can manifest in the environment at run time. However, the SAS may not have been explicitly developed with such conditions in mind, thereby requiring additional configuration states or updates to the requirements specification for the SAS to provide assurance that it continually satisfies its requirements and delivers acceptable behavior. By discovering both adverse environmental conditions and the SAS configuration states that can mitigate those conditions at design time, an SAS can be hardened against uncertainty prior to deployment, effectively extending its lifetime. This paper introduces two search-based techniques, Ragnarok and Valkyrie, for hardening an SAS against uncertainty. Ragnarok automatically discovers adverse conditions that negatively impact an SAS by searching for environmental conditions that explicitly cause requirements violations. Valkyrie then searches for SAS configurations that improve requirements satisficement throughout execution in response to discovered adverse environmental conditions. Together, these techniques can be used to improve the design and implementation of an SAS. We apply each technique to an industry-provided remote data mirroring application that can self-reconfigure in response to unknown or adverse conditions, such as network message delays, network link failures, and sensor noise.
Today, cloud vendors host third party black-box services, whose developers usually provide only textual descriptions or purely syntactical interface specifications. Cloud vendors that give substantial support to other third party developers to integrate hosted services into new software solutions would have a unique selling feature over their competitors. However, to reliably determine if a service is reusable, comprehensive service specifications are needed. Characteristic for comprehensive in contrast to syntactical specifications are the formalization of ontological and behavioral semantics, homogeneity according to a global ontology, and a service grounding that links the abstract service description and its technical realization. Homogeneous, semantical specifications enable to reliably identify reusable services, whereas the service grounding is needed for the technical service integration. In general, comprehensive specifications are not available and have to be derived. Existing automatized approaches are restricted to certain characteristics of comprehensiveness. In my PhD, I consider an automatized approach to derive fully-fledged comprehensive specifications for black-box services. Ontological semantics are derived from syntactical interface specifications. Behavioral semantics are mined from call logs that cloud vendors create to monitor the hosted services. The specifications are harmonized over a global ontology. The service grounding is established using traceability information. The approach enables third party developers to compose services into complex systems and creates new sales channels for cloud and service providers.
Intel SGX is the latest processor architecture promising secure code execution despite large, complex and hence potentially vulnerable legacy operating systems (OSs). However, two recent works identified vulnerabilities that allow an untrusted management OS to extract secret information from Intel SGX's enclaves, and to violate their integrity by exploiting concurrency bugs. In this work, we re-investigate delayed preemption (DP) in the context of Intel SGX. DP is a mechanism originally proposed for L4-family microkernels as disable-interrupt replacement. Recapitulating earlier results on language-based information-flow security, we illustrate the construction of leakage-free code for enclaves. However, as long as adversaries have fine-grained control over preemption timing, these solutions are impractical from a performance/complexity perspective. To overcome this, we resort to delayed preemption, and sketch a software implementation for hypervisors providing enclaves as well as a hardware extension for systems like SGX. Finally, we illustrate how static analyses for SGX may be extended to check confidentiality of preemption-delaying programs.
Based on constructive type theory, we study two idealized imperative languages GC and IC and verify the correctness of a compiler from GC to IC. GC is a guarded command language with underspecified execution order defined with an axiomatic semantics. IC is a deterministic low-level language with linear sequential composition and lexically scoped gotos defined with a small-step semantics. We characterize IC with an axiomatic semantics and prove that the compiler from GC to IC preserves specifications. The axiomatic semantics we consider model total correctness and map programs to continuous predicate transformers. We define the axiomatic semantics of GC and IC with elementary inductive predicates and show that the predicate transformer described by a program can be obtained compositionally by recursion on the syntax of the program using a fixed point operator for loops and continuations. We also show that two IC programs are contextually equivalent if and only if their predicate transformers are equivalent.
HDFS has been widely used for storing massive scale data which is vulnerable to site disaster. The file system backup is an important strategy for data retention. In this paper, we present an efficient, easy- to-use Backup and Disaster Recovery System for HDFS. The system includes a client based on HDFS with additional feature of remote backup, and a remote server with a HDFS cluster to keep the backup data. It supports full backup and regularly incremental backup to the server with very low cost and high throughout. In our experiment, the average speed of backup and recovery is up to 95 MB/s, approaching the theoretical maximum speed of gigabit Ethernet.
General sparse matrix-matrix multiplication (SpGEMM) is a core component of many algorithms. A number of recent works have used high throughput graphics processing units (GPUs) to accelerate SpGEMM. However, exploiting the power of GPUs for SpGEMM requires addressing a number of challenges, including highly imbalanced workloads and large numbers of inefficient random global memory accesses. This paper presents a SpGEMM algorithm which uses several novel techniques to overcome these problems. We first propose two low cost methods to achieve perfect load balancing during the most expensive step in SpGEMM. Next, we show how to eliminate nearly all random global memory accesses using shared memory based hash tables. To optimize the performance of the hash tables, we propose a lightweight method to estimate the number of nonzeros in the output matrix. We compared our algorithm to the CUSP, CUSPARSE and the state-of-the-art BHSPARSE GPU SpGEMM algorithms, and show that it performs 5.6x, 2.4x and 1.5x better on average, and up to 11.8x, 9.5x and 2.5x better in the best case, respectively. Furthermore, we show that our algorithm performs especially well on highly imbalanced and unstructured matrices.
Internet Engineering Task Force (IETF) is working on 6LoW-PAN standard which allows smart devices to be connected to Internet using large address space of IPV6. 6LoWPAN acts as a bridge between resource constrained devices and the Internet. The entire IoT space is vulnerable to local threats as well as the threats from the Internet. Due to the random deployment of the network nodes and the absence of tamper resistant shield, the resource constrained IoT elements face potential insider attacks even in presence of front line defense mechanism that involved cryptographic protocols. To detect such insidious nodes, an Intrusion Detection System (IDS) is required as a second line of defense. In this paper, we attempt to analyze such potential insider attacks, while reviewing the IDS based countermeasures. We attempt to propose a baseline for designing IDS for 6LoWPAN based IoT system.
Cyber-physical systems (CPS) are often network integrated to enable remote management, monitoring, and reporting. Such integration has made them vulnerable to cyber attacks originating from an untrusted network (e.g., the internet). Once an attacker breaches the network security, he could corrupt operations of the system in question, which may in turn lead to catastrophes. Hence there is a critical need to detect intrusions into mission-critical CPS. Signature based detection may not work well for CPS, whose complexity may preclude any succinct signatures that we will need. Specification based detection requires accurate definitions of system behaviour that similarly can be hard to obtain, due to the CPS's complexity and dynamics, as well as inaccuracies and incompleteness of design documents or operation manuals. Formal models, to be tractable, are often oversimplified, in which case they will not support effective detection. In this paper, we study a behaviour-based machine learning (ML) approach for the intrusion detection. Whereas prior unsupervised ML methods have suffered from high missed detection or false-positive rates, we use a high-fidelity CPS testbed, which replicates all main physical and control components of a modern water treatment facility, to generate systematic training data for a supervised method. The method does not only detect the occurrence of a cyber attack at the physical process layer, but it also identifies the specific type of the attack. Its detection is fast and robust to noise. Furthermore, its adaptive system model can learn quickly to match dynamics of the CPS and its operating environment. It exhibits a low false positive (FP) rate, yet high precision and recall.
WSN is a collection of tiny nodes that used to absorb the natural phenomenon from the operational environment and send it to the control station to extract the useful information. In most of the Existing Systems, the assumption is that the operational environment of the sensor nodes deployed is trustworthy and secure by means of some cryptographic operations and existing trust model. But in the reality it is not the case. Most of the existing systems lacks in providing reliable security to the sensor nodes. To overcome the above problem, in this paper, Beta Reputation and Direct Trust Model (BRDT) is the combination of Direct Trust and Beta Reputation Trust for secure communication in Wireless Sensor Networks. This model is used to perform secure routing in WSN. Overall, the method provides an efficient trust in WSN compared to existing methods.
In this study, we report on techniques and analyses that enable us to capture Internet-wide activity at individual IP address-level granularity by relying on server logs of a large commercial content delivery network (CDN) that serves close to 3 trillion HTTP requests on a daily basis. Across the whole of 2015, these logs recorded client activity involving 1.2 billion unique IPv4 addresses, the highest ever measured, in agreement with recent estimates. Monthly client IPv4 address counts showed constant growth for years prior, but since 2014, the IPv4 count has stagnated while IPv6 counts have grown. Thus, it seems we have entered an era marked by increased complexity, one in which the sole enumeration of active IPv4 addresses is of little use to characterize recent growth of the Internet as a whole. With this observation in mind, we consider new points of view in the study of global IPv4 address activity. Our analysis shows significant churn in active IPv4 addresses: the set of active IPv4 addresses varies by as much as 25% over the course of a year. Second, by looking across the active addresses in a prefix, we are able to identify and attribute activity patterns to networkm restructurings, user behaviors, and, in particular, various address assignment practices. Third, by combining spatio-temporal measures of address utilization with measures of traffic volume, and sampling-based estimates of relative host counts, we present novel perspectives on worldwide IPv4 address activity, including empirical observation of under-utilization in some areas, and complete utilization, or exhaustion, in others.
A major challenge of the existing attack detection approaches is the identification of relevant information to a particular situation, and the use of such information to perform multi-evidence intrusion detection. Addressing such a limitation requires integrating several aspects of context to better predict, avoid and respond to impending attacks. The quality and adequacy of contextual information is important to decrease uncertainty and correctly identify potential cyber-attacks. In this paper, a systematic methodology has been used to identify contextual dimensions that improve the effectiveness of detecting cyber-attacks. This methodology combines graph, probability, and information theories to create several context-based attack prediction models that analyze data at a high- and low-level. An extensive validation of our approach has been performed using a prototype system and several benchmark intrusion detection datasets yielding very promising results.
When reasoning about software security, researchers and practitioners use the phrase ``attack surface'' as a metaphor for risk. Enumerate and minimize the ways attackers can break in then risk is reduced and the system is better protected, the metaphor says. But software systems are much more complicated than their surfaces. We propose function- and file-level attack surface metrics–-proximity and risky walk–-that enable fine-grained risk assessment. Our risky walk metric is highly configurable: we use PageRank on a probability-weighted call graph to simulate attacker behavior of finding or exploiting a vulnerability. We provide evidence-based guidance for deploying these metrics, including an extensive parameter tuning study. We conducted an empirical study on two large open source projects, FFmpeg and Wireshark, to investigate the potential correlation between our metrics and historical post-release vulnerabilities. We found our metrics to be statistically significantly associated with vulnerable functions/files with a small-to-large Cohen's d effect size. Our prediction model achieved an increase of 36% (in FFmpeg) and 27% (in Wireshark) in the average value of F-measure over a base model built with SLOC and coupling metrics. Our prediction model outperformed comparable models from prior literature with notable improvements: 58% reduction in false negative rate, 81% reduction in false positive rate, and 548% increase in F-measure. These metrics advance vulnerability prevention by [(a)] being flexible in terms of granularity, performing better than vulnerability prediction literature, and being tunable so that practitioners can tailor the metrics to their products and better assess security risk.
A biased random-key genetic algorithm (BRKGA) is a general search procedure for finding optimal or near-optimal solutions to hard combinatorial optimization problems. It is derived from the random-key genetic algorithm of Bean (1994), differing in the way solutions are combined to produce offspring. BRKGAs have three key features that specialize genetic algorithms: A fixed chromosome encoding using a vector of N random keys or alleles over the real interval [0, 1), where the value of N depends on the instance of the optimization problem; A well-defined evolutionary process adopting parameterized uniform crossover to generate offspring and thus evolve the population; The introduction of new chromosomes called mutants in place of the mutation operator usually found in evolutionary algorithms. Such features simplify and standardize the procedure with a set of self-contained tasks from which only one is problem-dependent: that of decoding a chromosome, i.e. using, the keys to construct a solution to the underlying optimization problem, from which the objective function value or fitness can be computed. BRKGAs have the additional characteristic that, under a weak assumption, crossover always produces feasible offspring and, therefore, a repair or healing procedure to recover feasibility is not required in a BRKGA. In this tutorial we review the basic components of a BRKGA and introduce an Application Programming Interface (API) for quick implementations of BRKGA heuristics. We then apply the framework to a number of hard combinatorial optimization problems, including 2-D and 3-D packing problems, network design problems, routing problems, scheduling problems, and data mining. We conclude with a brief review of other domains where BRKGA heuristics have been applied.
We present a novel Cyber Security analytics framework. We demonstrate a comprehensive cyber security monitoring system to construct cyber security correlated events with feature selection to anticipate behaviour based on various sensors.
Conventional security methods like password and ID card methods are now rapidly replacing by biometrics for identification of a person. Biometrics uses physiological or behavioral characteristics of a person. Usage of biometric raises critical privacy and security concerns that, due to the noisy nature of biometrics, cannot be addressed using standard cryptographic methods. The loss of an enrollment biometric to an attacker is a security hazard because it may allow the attacker to get an unauthorized access to the system. Biometric template can be stolen and intruder can get access of biometric system using fake input. Hence, it becomes essential to design biometric system with secure template or if the biometric template in an application is compromised, the biometric signal itself is not lost forever and a new biometric template can be issued. One way is to combine the biometrics and cryptography or use transformed data instead of original biometric template. But traditional cryptography methods are not useful in biometrics because of intra-class variation. Biometric cryptosystem can apply fuzzy vault, fuzzy commitment, helper data and secure sketch, whereas, cancelable biometrics uses distorting transforms, Bio-Hashing, and Bio-Encoding techniques. In this paper, biometric cryptosystem is presented with fuzzy vault and fuzzy commitment techniques for fingerprint recognition system.
Synchronous replication is critical for today's enterprise IT organization. It is mandatory by regulation in several countries for some types of organizations, including banks and insurance companies. The technology has been available for a long period of time, but due to speed of light and maximal latency limitations, it is usually limited to a distance of 50-100 miles. Flight data recorders, also known as black boxes, have long been used to record the last actions which happened in airplanes at times of disasters. We present an integration between an Enterprise Data Recorder and an asynchronous replication mechanism, which allows breaking the functional limits that light speed imposes on synchronous replication.
After a software system is compromised, it can be difficult to understand what vulnerabilities attackers exploited. Any information residing on that machine cannot be trusted as attackers may have tampered with it to cover their tracks. Moreover, even after an exploit is known, it can be difficult to determine whether it has been used to compromise a given machine. Aviation has long-used black boxes to better understand the causes of accidents, enabling improvements that reduce the likelihood of future accidents. Many attacks introduce abnormal control flows to compromise systems. In this paper, we present BlackBox, a monitoring system for COTS software. Our techniques enable BlackBox to efficiently monitor unexpected and potentially harmful control flow in COTS binaries. BlackBox constructs dynamic profiles of an application's typical control flows to filter the vast majority of expected control flow behavior, leaving us with a manageable amount of data that can be logged across the network to remote devices. Modern applications make extensive use of dynamically generated code, some of which varies greatly between executions. We introduce support for code generators that can detect security-sensitive behaviors while allowing BlackBox to avoid logging the majority of ordinary behaviors. We have implemented BlackBox in DynamoRIO. We evaluate the runtime overhead of BlackBox, and show that it can effectively monitor recent versions of Microsoft Office and Google Chrome. We show that in ROP, COOP, and state- of-the-art JIT injection attacks, BlackBox logs the pivotal actions by which the attacker takes control, and can also blacklist those actions to prevent repeated exploits.
Explicit non-linear transformations of existing steganalysis features are shown to boost their ability to detect steganography in combination with existing simple classifiers, such as the FLD-ensemble. The non-linear transformations are learned from a small number of cover features using Nyström approximation on pilot vectors obtained with kernelized PCA. The best performance is achieved with the exponential form of the Hellinger kernel, which improves the detection accuracy by up to 2-3% for spatial-domain contentadaptive steganography. Since the non-linear map depends only on the cover source and its learning has a low computational complexity, the proposed approach is a practical and low cost method for boosting the accuracy of existing detectors built as binary classifiers. The map can also be used to significantly reduce the feature dimensionality (by up to factor of ten) without performance loss with respect to the non-transformed features.
Kernel hardening has been an important topic since many applications and security mechanisms often consider the kernel as part of their Trusted Computing Base (TCB). Among various hardening techniques, Kernel Address Space Layout Randomization (KASLR) is the most effective and widely adopted defense mechanism that can practically mitigate various memory corruption vulnerabilities, such as buffer overflow and use-after-free. In principle, KASLR is secure as long as no memory leak vulnerability exists and high entropy is ensured. In this paper, we introduce a highly stable timing attack against KASLR, called DrK, that can precisely de-randomize the memory layout of the kernel without violating any such assumptions. DrK exploits a hardware feature called Intel Transactional Synchronization Extension (TSX) that is readily available in most modern commodity CPUs. One surprising behavior of TSX, which is essentially the root cause of this security loophole, is that it aborts a transaction without notifying the underlying kernel even when the transaction fails due to a critical error, such as a page fault or an access violation, which traditionally requires kernel intervention. DrK turned this property into a precise timing channel that can determine the mapping status (i.e., mapped versus unmapped) and execution status (i.e., executable versus non-executable) of the privileged kernel address space. In addition to its surprising accuracy and precision, DrK is universally applicable to all OSes, even in virtualized environments, and generates no visible footprint, making it difficult to detect in practice. We demonstrated that DrK can break the KASLR of all major OSes (i.e., Windows, Linux, and OS X) with near-perfect accuracy in under a second. Finally, we propose potential countermeasures that can effectively prevent or mitigate the DrK attack. We urge our community to be aware of the potential threat of having Intel TSX, which is present in most recent Intel CPUs – 100% in workstation and 60% in high-end Intel CPUs since Skylake – and is even available on Amazon EC2 (X1).
We develop a systematic approach for analyzing client-server applications that aim to hide sensitive user data from untrusted servers. We then apply it to Mylar, a framework that uses multi-key searchable encryption (MKSE) to build Web applications on top of encrypted data. We demonstrate that (1) the Popa-Zeldovich model for MKSE does not imply security against either passive or active attacks; (2) Mylar-based Web applications reveal users' data and queries to passive and active adversarial servers; and (3) Mylar is generically insecure against active attacks due to system design flaws. Our results show that the problem of securing client-server applications against actively malicious servers is challenging and still unsolved. We conclude with general lessons for the designers of systems that rely on property-preserving or searchable encryption to protect data from untrusted servers.