Biblio
The Extensible Markup Language (XML) is a complex language, and consequently, XML-based protocols are susceptible to entire classes of implicit and explicit security problems. Message formats in XML-based protocols are usually specified in XML Schema, and as a first-line defense, schema validation should reject malformed input. However, extension points in most protocol specifications break validation. Extension points are wildcards and considered best practice for loose composition, but they also enable an attacker to add unchecked content in a document, e.g., for a signature wrapping attack. This paper introduces datatyped XML visibly pushdown automata (dXVPAs) as language representation for mixed-content XML and presents an incremental learner that infers a dXVPA from example documents. The learner generalizes XML types and datatypes in terms of automaton states and transitions, and an inferred dXVPA converges to a good-enough approximation of the true language. The automaton is free from extension points and capable of stream validation, e.g., as an anomaly detector for XML-based protocols. For dealing with adversarial training data, two scenarios of poisoning are considered: a poisoning attack is either uncovered at a later time or remains hidden. Unlearning can therefore remove an identified poisoning attack from a dXVPA, and sanitization trims low-frequent states and transitions to get rid of hidden attacks. All algorithms have been evaluated in four scenarios, including a web service implemented in Apache Axis2 and Apache Rampart, where attacks have been simulated. In all scenarios, the learned automaton had zero false positives and outperformed traditional schema validation.
The Semantic Web can be used to enable the interoperability of IoT devices and to annotate their functional and nonfunctional properties, including security and privacy. In this paper, we will show how to use the ontology and JSON-LD to annotate connectivity, security and privacy properties of IoT devices. Out of that, we will present our prototype for a lightweight, secure application level protocol wrapper that ensures communication consistency, secrecy and integrity for low cost IoT devices like the ESP8266 and Photon particle.
The article considers the approach to static analysis of program code and the general principles of static analyzer operation. The authors identify the most important syntactic and semantic information in the programs, which can be used to find errors in the source code. The general methodology for development of diagnostic rules is proposed, which will improve the efficiency of static code analyzers.
Recently perceived vulnerabilities in public key infrastructures (PKI) demand that a semantic or cognitive definition of trust is essential for augmenting the security through trust formulations. In this paper, we examine the meaning of trust in PKIs. Properly categorized trust can help in developing intelligent algorithms that can adapt to the security and privacy requirements of the clients. We delineate the different types of trust in a generic PKI model.
Policy design is an important part of software development. As security breaches increase in variety, designing a security policy that addresses all potential breaches becomes a nontrivial task. A complete security policy would specify rules to prevent breaches. Systematically determining which, if any, policy clause has been violated by a reported breach is a means for identifying gaps in a policy. Our research goal is to help analysts measure the gaps between security policies and reported breaches by developing a systematic process based on semantic reasoning. We propose SEMAVER, a framework for determining coverage of breaches by policies via comparison of individual policy clauses and breach descriptions. We represent a security policy as a set of norms. Norms (commitments, authorizations, and prohibitions) describe expected behaviors of users, and formalize who is accountable to whom and for what. A breach corresponds to a norm violation. We develop a semantic similarity metric for pairwise comparison between the norm that represents a policy clause and the norm that has been violated by a reported breach. We use the US Health Insurance Portability and Accountability Act (HIPAA) as a case study. Our investigation of a subset of the breaches reported by the US Department of Health and Human Services (HHS) reveals the gaps between HIPAA and reported breaches, leading to a coverage of 65%. Additionally, our classification of the 1,577 HHS breaches shows that 44% of the breaches are accidental misuses and 56% are malicious misuses. We find that HIPAA's gaps regarding accidental misuses are significantly larger than its gaps regarding malicious misuses.
Intrusion Detection Systems (IDS) have been in existence for many years now, but they fall short in efficiently detecting zero-day attacks. This paper presents an organic combination of Semantic Link Networks (SLN) and dynamic semantic graph generation for the on the fly discovery of zero-day attacks using the Spark Streaming platform for parallel detection. In addition, a minimum redundancy maximum relevance (MRMR) feature selection algorithm is deployed to determine the most discriminating features of the dataset. Compared to previous studies on zero-day attack identification, the described method yields better results due to the semantic learning and reasoning on top of the training data and due to the use of collaborative classification methods. We also verified the scalability of our method in a distributed environment.
Patches and related information about software vulnerabilities are often made available to the public, aiming to facilitate timely fixes. Unfortunately, the slow paces of system updates (30 days on average) often present to the attackers enough time to recover hidden bugs for attacking the unpatched systems. Making things worse is the potential to automatically generate exploits on input-validation flaws through reverse-engineering patches, even though such vulnerabilities are relatively rare (e.g., 5% among all Linux kernel vulnerabilities in last few years). Less understood, however, are the implications of other bug-related information (e.g., bug descriptions in CVE), particularly whether utilization of such information can facilitate exploit generation, even on other vulnerability types that have never been automatically attacked. In this paper, we seek to use such information to generate proof-of-concept (PoC) exploits for the vulnerability types never automatically attacked. Unlike an input validation flaw that is often patched by adding missing sanitization checks, fixing other vulnerability types is more complicated, usually involving replacement of the whole chunk of code. Without understanding of the code changed, automatic exploit becomes less likely. To address this challenge, we present SemFuzz, a novel technique leveraging vulnerability-related text (e.g., CVE reports and Linux git logs) to guide automatic generation of PoC exploits. Such an end-to-end approach is made possible by natural-language processing (NLP) based information extraction and a semantics-based fuzzing process guided by such information. Running over 112 Linux kernel flaws reported in the past five years, SemFuzz successfully triggered 18 of them, and further discovered one zero-day and one undisclosed vulnerabilities. These flaws include use-after-free, memory corruption, information leak, etc., indicating that more complicated flaws can also be automatically attacked. This finding calls into question the way vulnerability-related information is shared today.
The Semantic Web today is a web that allows for intelligent knowledge retrieval by means of semantically annotated tags. This web also known as Intelligent web aims to provide meaningful information to man and machines equally. However, the information thus provided lacks the component of trust. Therefore we propose a method to embed trust in semantic web documents by the concept of provenance which provides answers to who, when, where and by whom the documents were created or modified. This paper demonstrates the same using the Manchester approach of provenance implemented in a University Ontology.
We present a framework for learning to describe finegrained visual differences between instances using attribute phrases. Attribute phrases capture distinguishing aspects of an object (e.g., “propeller on the nose” or “door near the wing” for airplanes) in a compositional manner. Instances within a category can be described by a set of these phrases and collectively they span the space of semantic attributes for a category. We collect a large dataset of such phrases by asking annotators to describe several visual differences between a pair of instances within a category. We then learn to describe and ground these phrases to images in the context of a reference game between a speaker and a listener. The goal of a speaker is to describe attributes of an image that allows the listener to correctly identify it within a pair. Data collected in a pairwise manner improves the ability of the speaker to generate, and the ability of the listener to interpret visual descriptions. Moreover, due to the compositionality of attribute phrases, the trained listeners can interpret descriptions not seen during training for image retrieval, and the speakers can generate attribute-based explanations for differences between previously unseen categories. We also show that embedding an image into the semantic space of attribute phrases derived from listeners offers 20% improvement in accuracy over existing attributebased representations on the FGVC-aircraft dataset.
Side channel attacks have been used to extract critical data such as encryption keys and confidential user data in a variety of adversarial settings. In practice, this threat is addressed by adhering to a constant-time programming discipline, which imposes strict constraints on the way in which programs are written. This introduces an additional hurdle for programmers faced with the already difficult task of writing secure code, highlighting the need for solutions that give the same source-level guarantees while supporting more natural programming models. We propose a novel type system for verifying that programs correctly implement constant-resource behavior. Our type system extends recent work on automatic amortized resource analysis (AARA), a set of techniques that automatically derive provable upper bounds on the resource consumption of programs. We devise new techniques that build on the potential method to achieve compositionality, precision, and automation. A strict global requirement that a program always maintains constant resource usage is too restrictive for most practical applications. It is sufficient to require that the program's resource behavior remain constant with respect to an attacker who is only allowed to observe part of the program's state and behavior. To account for this, our type system incorporates information flow tracking into its resource analysis. This allows our system to certify programs that need to violate the constant-time requirement in certain cases, as long as doing so does not leak confidential information to attackers. We formalize this guarantee by defining a new notion of resource-aware noninterference, and prove that our system enforces it. Finally, we show how our type inference algorithm can be used to synthesize a constant-time implementation from one that cannot be verified as secure, effectively repairing insecure programs automatically. We also show how a second novel AARA system that computes lower bounds on reso- rce usage can be used to derive quantitative bounds on the amount of information that a program leaks through its resource use. We implemented each of these systems in Resource Aware ML, and show that it can be applied to verify constant-time behavior in a number of applications including encryption and decryption routines, database queries, and other resource-aware functionality.
While research on Information-Centric Networking (ICN) flourishes, its adoption seems to be an elusive goal. In this paper we propose Edge-ICN: a novel approach for deploying ICN in a single large network, such as the network of an Internet Service Provider. Although Edge-ICN requires nothing beyond an SDN-based network supporting the OpenFlow protocol, with ICN-aware nodes only at the edges of the network, it still offers the same benefits as a clean-slate ICN architecture but without the deployment hassles. Moreover, by proxying legacy traffic and transparently forwarding it through the Edge-ICN nodes, all existing applications can operate smoothly, while offering significant advantages to applications such as native support for scalable anycast, multicast, and multi-source forwarding. In this context, we show how the proposed functionality at the edge of the network can specifically benefit CoAP-based IoT applications. Our measurements show that Edge-ICN induces on average the same control plane overhead for name resolution as a centralized approach, while also enabling IoT applications to build on anycast, multicast, and multi-source forwarding primitives.
Social media has become an important platform for people to express opinions, share information and communicate with others. Detecting and tracking topics from social media can help people grasp essential information and facilitate many security-related applications. As social media texts are usually short, traditional topic evolution models built based on LDA or HDP often suffer from the data sparsity problem. Recently proposed topic evolution models are more suitable for short texts, but they need to manually specify topic number which is fixed during different time period. To address these issues, in this paper, we propose a nonparametric topic evolution model for social media short texts. We first propose the recurrent semantic dependent Chinese restaurant process (rsdCRP), which is a nonparametric process incorporating word embeddings to capture semantic similarity information. Then we combine rsdCRP with word co-occurrence modeling and build our short-text oriented topic evolution model sdTEM. We carry out experimental studies on Twitter dataset. The results demonstrate the effectiveness of our method to monitor social media topic evolution compared to the baseline methods.
The area of secure compilation aims to design compilers which produce hardened code that can withstand attacks from low-level co-linked components. So far, there is no formal correctness criterion for secure compilers that comes with a clear understanding of what security properties the criterion actually provides. Ideally, we would like a criterion that, if fulfilled by a compiler, guarantees that large classes of security properties of source language programs continue to hold in the compiled program, even as the compiled program is run against adversaries with low-level attack capabilities. This paper provides such a novel correctness criterion for secure compilers, called trace-preserving compilation (TPC). We show that TPC preserves a large class of security properties, namely all safety hyperproperties. Further, we show that TPC preserves more properties than full abstraction, the de-facto criterion used for secure compilation. Then, we show that several fully abstract compilers described in literature satisfy an additional, common property, which implies that they also satisfy TPC. As an illustration, we prove that a fully abstract compiler from a typed source language to an untyped target language satisfies TPC.
In this paper we present results of a research on automatic extremist text detection. For this purpose an experimental dataset in the Russian language was created. According to the Russian legislation we cannot make it publicly available. We compared various classification methods (multinomial naive Bayes, logistic regression, linear SVM, random forest, and gradient boosting) and evaluated the contribution of differentiating features (lexical, semantic and psycholinguistic) to classification quality. The results of experiments show that psycholinguistic and semantic features are promising for extremist text detection.
Integer errors in C/C++ are caused by arithmetic operations yielding results which are unrepresentable in certain type. They can lead to serious safety and security issues. Due to the complicated semantics of C/C++ integers, integer errors are widely harbored in real-world programs and it is error-prone to repair them even for experts. An automatic tool is desired to 1) automatically generate fixes which assist developers to correct the buggy code, and 2) provide sufficient hints to help developers review the generated fixes and better understand integer types in C/C++. In this paper, we present a tool IntPTI that implements the desired functionalities for C programs. IntPTI infers appropriate types for variables and expressions to eliminate representation issues, and then utilizes the derived types with fix patterns codified from the successful human-written patches. IntPTI provides a user-friendly web interface which allows users to review and manage the fixes. We evaluate IntPTI on 7 real-world projects and the results show its competitive repair accuracy and its scalability on large code bases. The demo video for IntPTI is available at: https://youtu.be/9Tgd4A\_FgZM.
As the Internet becomes an important part of the infrastructure our society depends on, it is crucial to construct networks that are able to work even when part of the network is compromised. This paper presents the first practical intrusion-tolerant network service, targeting high-value applications such as monitoring and control of global clouds and management of critical infrastructure for the power grid. We use an overlay approach to leverage the existing IP infrastructure while providing the required resiliency and timeliness. Our solution overcomes malicious attacks and compromises in both the underlying network infrastructure and in the overlay itself. We deploy and evaluate the intrusion-tolerant overlay implementation on a global cloud spanning East Asia, North America, and Europe, and make it publicly available.
In a number of information security scenarios, human beings can be better than technical security measures at detecting threats. This is particularly the case when a threat is based on deception of the user rather than exploitation of a specific technical flaw, as is the case of spear-phishing, application spoofing, multimedia masquerading and other semantic social engineering attacks. Here, we put the concept of the human-as-a-security-sensor to the test with a first case study on a small number of participants subjected to different attacks in a controlled laboratory environment and provided with a mechanism to report these attacks if they spot them. A key challenge is to estimate the reliability of each report, which we address with a machine learning approach. For comparison, we evaluate the ability of known technical security countermeasures in detecting the same threats. This initial proof of concept study shows that the concept is viable.
Over the last decade, a globalization of the software industry took place, which facilitated the sharing and reuse of code across existing project boundaries. At the same time, such global reuse also introduces new challenges to the software engineering community, with not only components but also their problems and vulnerabilities being now shared. For example, vulnerabilities found in APIs no longer affect only individual projects but instead might spread across projects and even global software ecosystem borders. Tracing these vulnerabilities at a global scale becomes an inherently difficult task since many of the existing resources required for such analysis still rely on proprietary knowledge representation. In this research, we introduce an ontology-based knowledge modeling approach that can eliminate such information silos. More specifically, we focus on linking security knowledge with other software knowledge to improve traceability and trust in software products (APIs). Our approach takes advantage of the Semantic Web and its reasoning services, to trace and assess the impact of security vulnerabilities across project boundaries. We present a case study, to illustrate the applicability and flexibility of our ontological modeling approach by tracing vulnerabilities across project and resource boundaries.