Biblio
The combination of (1) hard to eradicate low-level vulnerabilities, (2) a large trusted computing base written in a memory-unsafe language and (3) a desperate need to provide strong software security guarantees, led to the development of protected-module architectures. Such architectures provide strong isolation of protected modules: Security of code and data depends only on a module's own implementation. In this paper we discuss how such protected modules should be written. From an academic perspective it is clear that the future lies with memory-safe languages. Unfortunately, from a business and management perspective, that is a risky path and will remain so in the near future. The use of well-known but memory-unsafe languages such as C and C++ seem inevitable. We argue that the academic world should take another look at the automatic hardening of software written in such languages to mitigate low-level security vulnerabilities. This is a well-studied topic for full applications, but protected-module architectures introduce a new, and much more challenging environment. Porting existing security measures to a protected-module setting without a thorough security analysis may even harm security of the protected modules they try to protect.
Modelling network traffic is gaining importance to counter modern security threats of ever increasing sophistication. It is though surprisingly difficult and costly to construct reliable classifiers on top of telemetry data due to the variety and complexity of signals that no human can manage to interpret in full. Obtaining training data with sufficiently large and variable body of labels can thus be seen as a prohibitive problem. The goal of this work is to detect infected computers by observing their HTTP(S) traffic collected from network sensors, which are typically proxy servers or network firewalls, while relying on only minimal human input in the model training phase. We propose a discriminative model that makes decisions based on a computer's all traffic observed during a predefined time window (5 minutes in our case). The model is trained on traffic samples collected over equally-sized time windows for a large number of computers, where the only labels needed are (human) verdicts about the computer as a whole (presumed infected vs. presumed clean). As part of training, the model itself learns discriminative patterns in traffic targeted to individual servers and constructs the final high-level classifier on top of them. We show the classifier to perform with very high precision, and demonstrate that the learned traffic patterns can be interpreted as Indicators of Compromise. We implement the discriminative model as a neural network with special structure reflecting two stacked multi instance problems. The main advantages of the proposed configuration include not only improved accuracy and ability to learn from gross labels, but also automatic learning of server types (together with their detectors) that are typically visited by infected computers.
Modelling network traffic is gaining importance to counter modern security threats of ever increasing sophistication. It is though surprisingly difficult and costly to construct reliable classifiers on top of telemetry data due to the variety and complexity of signals that no human can manage to interpret in full. Obtaining training data with sufficiently large and variable body of labels can thus be seen as a prohibitive problem. The goal of this work is to detect infected computers by observing their HTTP(S) traffic collected from network sensors, which are typically proxy servers or network firewalls, while relying on only minimal human input in the model training phase. We propose a discriminative model that makes decisions based on a computer's all traffic observed during a predefined time window (5 minutes in our case). The model is trained on traffic samples collected over equally-sized time windows for a large number of computers, where the only labels needed are (human) verdicts about the computer as a whole (presumed infected vs. presumed clean). As part of training, the model itself learns discriminative patterns in traffic targeted to individual servers and constructs the final high-level classifier on top of them. We show the classifier to perform with very high precision, and demonstrate that the learned traffic patterns can be interpreted as Indicators of Compromise. We implement the discriminative model as a neural network with special structure reflecting two stacked multi instance problems. The main advantages of the proposed configuration include not only improved accuracy and ability to learn from gross labels, but also automatic learning of server types (together with their detectors) that are typically visited by infected computers.
In this paper, the design of an event-driven middleware for general purpose services in smart grid (SG) is presented. The main purpose is to provide a peer-to-peer distributed software infrastructure to allow the access of new multiple and authorized actors to SGs information in order to provide new services. To achieve this, the proposed middleware has been designed to be: 1) event-based; 2) reliable; 3) secure from malicious information and communication technology attacks; and 4) to enable hardware independent interoperability between heterogeneous technologies. To demonstrate practical deployment, a numerical case study applied to the whole U.K. distribution network is presented, and the capabilities of the proposed infrastructure are discussed.
Preprocessors support the diversification of software products with \#ifdefs, but also require additional effort from developers to maintain and understand variable code. We conjecture that \#ifdefs cause developers to produce more vulnerable code because they are required to reason about multiple features simultaneously and maintain complex mental models of dependencies of configurable code. We extracted a variational call graph across all configurations of the Linux kernel, and used configuration complexity metrics to compare vulnerable and non-vulnerable functions considering their vulnerability history. Our goal was to learn about whether we can observe a measurable influence of configuration complexity on the occurrence of vulnerabilities. Our results suggest, among others, that vulnerable functions have higher variability than non-vulnerable ones and are also constrained by fewer configuration options. This suggests that developers are inclined to notice functions appear in frequently-compiled product variants. We aim to raise developers' awareness to address variability more systematically, since configuration complexity is an important, but often ignored aspect of software product lines.
Preprocessors support the diversification of software products with \#ifdefs, but also require additional effort from developers to maintain and understand variable code. We conjecture that \#ifdefs cause developers to produce more vulnerable code because they are required to reason about multiple features simultaneously and maintain complex mental models of dependencies of configurable code. We extracted a variational call graph across all configurations of the Linux kernel, and used configuration complexity metrics to compare vulnerable and non-vulnerable functions considering their vulnerability history. Our goal was to learn about whether we can observe a measurable influence of configuration complexity on the occurrence of vulnerabilities. Our results suggest, among others, that vulnerable functions have higher variability than non-vulnerable ones and are also constrained by fewer configuration options. This suggests that developers are inclined to notice functions appear in frequently-compiled product variants. We aim to raise developers' awareness to address variability more systematically, since configuration complexity is an important, but often ignored aspect of software product lines.
Proxy Mobile IPv6 (PMIPv6) is an IP mobility protocol. In a PMIPv6 domain, local mobility anchor is involved in control as well as data communication. To ease the load on a mobility anchor and avoid single point of failure, the PMIPv6 standard provides the opportunity of having multiple mobility anchors. In this paper, we propose a Software Defined Networking (SDN) based solution to provide load balancing among mobility anchors, in a SDN based PMIPv6 domain. In the proposed solution, a mobility controller performs acts as a central control entity, and performs load monitoring on the mobility anchors. On detecting the load crossing over a threshold for a certain mobility anchor, the controller moves some traffic from highly loaded mobility anchor to relatively less loaded mobility anchor. Analytical model and primitive performance evaluation of the proposed solution is presented in this paper, which demonstrates 5% and 40% improvement in uplink and downlink traffic disruption periods, respectively
Software components, which are vulnerable to being exploited, need to be identified and patched. Employing any prevention techniques designed for the purpose of detecting vulnerable software components in early stages can reduce the expenses associated with the software testing process significantly and thus help building a more reliable and robust software system. Although previous studies have demonstrated the effectiveness of adapting prediction techniques in vulnerability detection, the feasibility of those techniques is limited mainly because of insufficient training data sets. This paper proposes a prediction technique targeting at early identification of potentially vulnerable software components. In the proposed scheme, the potentially vulnerable components are viewed as mislabeled data that may contain true but not yet observed vulnerabilities. The proposed hybrid technique combines the supports vector machine algorithm and ensemble learning strategy to better identify potential vulnerable components. The proposed vulnerability detection scheme is evaluated using some Java Android applications. The results demonstrated that the proposed hybrid technique could identify potentially vulnerable classes with high precision and relatively acceptable accuracy and recall.
We show that elliptic-curve cryptography implementations on mobile devices are vulnerable to electromagnetic and power side-channel attacks. We demonstrate full extraction of ECDSA secret signing keys from OpenSSL and CoreBitcoin running on iOS devices, and partial key leakage from OpenSSL running on Android and from iOS's CommonCrypto. These non-intrusive attacks use a simple magnetic probe placed in proximity to the device, or a power probe on the phone's USB cable. They use a bandwidth of merely a few hundred kHz, and can be performed cheaply using an audio card and an improvised magnetic probe.
Non-malleability is an important and intensively studied security notion for many cryptographic primitives. In the context of public key encryption, this notion means it is infeasible for an adversary to transform an encryption of some message m into one of a related message m' under the given public key. Although it has provided a strong security property for many applications, it still does not suffice for some scenarios like the system where the users could issue keys on-the-fly. In such settings, the adversary may have the power to transform the given public key and the ciphertext. To withstand such attacks, Fischlin introduced a stronger notion, known as complete non-malleability, which requires that the non-malleability property be preserved even for the adversaries attempting to produce a ciphertext of some related message under the transformed public key. To date, many schemes satisfying this stronger security have been proposed, but they are either inefficient or proved secure in the random oracle model. In this work, we put forward a new encryption scheme in the common reference string model. Based on the standard DBDH assumption, the proposed scheme is proved completely non-malleable secure against adaptive chosen ciphertext attacks in the standard model. In our scheme, the well-formed public keys and ciphertexts could be publicly recognized without drawing support from unwieldy techniques like non-interactive zero knowledge proofs or one-time signatures, thus achieving a better performance.
In data outsourcing, a client stores a large amount of data on an untrusted server; subsequently, the client can request the server to compute a function on any subset of the data. This setting naturally leads to two security requirements: confidentiality of input data, and authenticity of computations. Existing approaches that satisfy both requirements simultaneously are built on fully homomorphic encryption, which involves expensive computation on the server and client and hence is impractical. In this paper, we propose two verifiable homomorphic encryption schemes that do not rely on fully homomorphic encryption. The first is a simple and efficient scheme for linear functions. The second scheme supports the class of multivariate quadratic functions, by combining the Paillier cryptosystem with a new homomorphic message authentication code (MAC) scheme. Through formal security analysis, we show that the schemes are semantically secure and unforgeable.
The microelectronics industry seeks screening tools that can be used to verify the origin of and track integrated circuits (ICs) throughout their lifecycle. Embedded circuits that measure process variation of an IC are well known. This paper adds to previous work using these circuits for studying manufacturer characteristics on final product ICs, particularly for the purpose of developing and verifying a signature for a microelectronics manufacturing facility (fab). We present the design, measurements and analysis of 159 silicon ICs which were built as a proof of concept for this purpose. 80 copies of our proof of concept IC were built at one fab, and 80 more copies were built across two lots at a second fab. Using these ICs, our prototype circuits allowed us to distinguish these two fabs with up to 98.7% accuracy and also distinguish the two lots from the second fab with up to 98.8% accuracy.
The microelectronics industry seeks screening tools that can be used to verify the origin of and track integrated circuits (ICs) throughout their lifecycle. Embedded circuits that measure process variation of an IC are well known. This paper adds to previous work using these circuits for studying manufacturer characteristics on final product ICs, particularly for the purpose of developing and verifying a signature for a microelectronics manufacturing facility (fab). We present the design, measurements and analysis of 159 silicon ICs which were built as a proof of concept for this purpose. 80 copies of our proof of concept IC were built at one fab, and 80 more copies were built across two lots at a second fab. Using these ICs, our prototype circuits allowed us to distinguish these two fabs with up to 98.7% accuracy and also distinguish the two lots from the second fab with up to 98.8% accuracy.
The output of 3D volume segmentation is crucial to a wide range of endeavors. Producing accurate segmentations often proves to be both inefficient and challenging, in part due to lack of imaging data quality (contrast and resolution), and because of ambiguity in the data that can only be resolved with higher-level knowledge of the structure and the context wherein it resides. Automatic and semi-automatic approaches are improving, but in many cases still fail or require substantial manual clean-up or intervention. Expert manual segmentation and review is therefore still the gold standard for many applications. Unfortunately, existing tools (both custom-made and commercial) are often designed based on the underlying algorithm, not the best method for expressing higher-level intention. Our goal is to analyze manual (or semi-automatic) segmentation to gain a better understanding of both low-level (perceptual tasks and actions) and high-level decision making. This can be used to produce segmentation tools that are more accurate, efficient, and easier to use. Questioning or observation alone is insufficient to capture this information, so we utilize a hybrid capture protocol that blends observation, surveys, and eye tracking. We then developed, and validated, data coding schemes capable of discerning low-level actions and overall task structures.
The Wikidata platform is a crowdsourced, structured knowledgebase aiming to provide integrated, free and language-agnostic facts which are–-amongst others–-used by Wikipedias. Users who actively enter, review and revise data on Wikidata are assisted by a property suggesting system which provides users with properties that might also be applicable to a given item. We argue that evaluating and subsequently improving this recommendation mechanism and hence, assisting users, can directly contribute to an even more integrated, consistent and extensive knowledge base serving a huge variety of applications. However, the quality and usefulness of such recommendations has not been evaluated yet. In this work, we provide the first evaluation of different approaches aiming to provide users with property recommendations in the process of curating information on Wikidata. We compare the approach currently facilitated on Wikidata with two state-of-the-art recommendation approaches stemming from the field of RDF recommender systems and collaborative information systems. Further, we also evaluate hybrid recommender systems combining these approaches. Our evaluations show that the current recommendation algorithm works well in regards to recall and precision, reaching a recall@7 of 79.71% and a precision@7 of 27.97%. We also find that generally, incorporating contextual as well as classifying information into the computation of property recommendations can further improve its performance significantly.
Recent findings have shown that network and system attacks in Software-Defined Networks (SDNs) have been caused by malicious network applications that misuse APIs in an SDN controller. Such attacks can both crash the controller and change the internal data structure in the controller, causing serious damage to the infrastructure of SDN-based networks. To address this critical security issue, we introduce a security framework called AEGIS to prevent controller APIs from being misused by malicious network applications. Through the run-time verification of API calls, AEGIS performs a fine-grained access control for important controller APIs that can be misused by malicious applications. The usage of API calls is verified in real time by sophisticated security access rules that are defined based on the relationships between applications and data in the SDN controller. We also present a prototypical implementation of AEGIS and demonstrate its effectiveness and efficiency by performing six different controller attacks including new attacks we have recently discovered.
Providing efficient protection against energy consumption based side channel attacks (SCAs) for block ciphers is a relevant topic for the research community, as current overheads are in the 100x range. Unprofiled SCAs exploit information leakage from the outmost rounds of a cipher; we propose a solution encasing it between keyed transformations amenable to an efficient SCA protection. Our solution can be employed as a drop in replacement for an unprotected implementation, or be retrofit to an existing one, while retaining communication capabilities with legacy insecure endpoints. Experiments on a Cortex-M4 μC, show performance improvements in the range of 60x, compared with available solutions.
Working in ISM band becomes overcrowded, shared unlicensed spectrum band, leads to a reduction in the quality of communication. This makes increase in packet loss caused by collisions and results in the necessity of packets retransmissions. In wireless sensor networks a large amount of energy of sensor nodes will be wasted during retransmissions. Cognitive radio is the technology makes it possible for sensor nodes to make use of licensed bands. In this paper a routing technique for cognitive radio wireless sensor networks is presented, that is based on a cross-layer design that jointly considers route and spectrum selection. This method has two main phases: next hop selection and channel selection. The routing is done hop-by-hop with local information and decisions, which are more compatible with sensor networks. Primary user action and prevention from interfering with them is considered in all spectrum decisions. It uniformly distributes frequency channels between neighboring nodes, which lead to a local reduction in collision probability. This clearly affects energy consumption in all sensor nodes. The route selection is energy-aware and a learning based technique is used to reduce the packet delay with respect to hop-count. The imitation reveals that by applying cognitive radio technology to WSNs and selecting a proper channel, we can consciously decrease collision probability. This saves energy of sensor nodes and improves the network lifetime.
Failing to properly isolate components in the same address space has resulted in a substantial amount of vulnerabilities. Enforcing the least privilege principle for memory accesses can selectively isolate software components to restrict attack surface and prevent unintended cross-component memory corruption. However, the boundaries and interactions between software components are hard to reason about and existing approaches have failed to stop attackers from exploiting vulnerabilities caused by poor isolation. We present the secure memory views (SMV) model: a practical and efficient model for secure and selective memory isolation in monolithic multithreaded applications. SMV is a third generation privilege separation technique that offers explicit access control of memory and allows concurrent threads within the same process to partially share or fully isolate their memory space in a controlled and parallel manner following application requirements. An evaluation of our prototype in the Linux kernel (TCB textless 1,800 LOC) shows negligible runtime performance overhead in real-world applications including Cherokee web server (textless 0.69%), Apache httpd web server (textless 0.93%), and Mozilla Firefox web browser (textless 1.89%) with at most 12 LOC changes.
Given a history of detected malware attacks, can we predict the number of malware infections in a country? Can we do this for different malware and countries? This is an important question which has numerous implications for cyber security, right from designing better anti-virus software, to designing and implementing targeted patches to more accurately measuring the economic impact of breaches. This problem is compounded by the fact that, as externals, we can only detect a fraction of actual malware infections. In this paper we address this problem using data from Symantec covering more than 1.4 million hosts and 50 malware spread across 2 years and multiple countries. We first carefully design domain-based features from both malware and machine-hosts perspectives. Secondly, inspired by epidemiological and information diffusion models, we design a novel temporal non-linear model for malware spread and detection. Finally we present ESM, an ensemble-based approach which combines both these methods to construct a more accurate algorithm. Using extensive experiments spanning multiple malware and countries, we show that ESM can effectively predict malware infection ratios over time (both the actual number and trend) upto 4 times better compared to several baselines on various metrics. Furthermore, ESM's performance is stable and robust even when the number of detected infections is low.
WebRTC is one of the latest additions to the ever growing repository of Web browser technologies, which push the envelope of native Web application capabilities. WebRTC allows real-time peer-to-peer audio and video chat, that runs purely in the browser. Unlike existing video chat solutions, such as Skype, that operate in a closed identity ecosystem, WebRTC was designed to be highly flexible, especially in the domains of signaling and identity federation. This flexibility, however, opens avenues for identity fraud. In this paper, we explore the technical underpinnings of WebRTC's identity management architecture. Based on this analysis, we identify three novel attacks against endpoint authenticity. To answer the identified threats, we propose and discuss defensive strategies, including security improvements for the WebRTC specifications and mitigation techniques for the identity and service providers.
In this paper, we introduce Entropy/IP: a system that discovers Internet address structure based on analyses of a subset of IPv6 addresses known to be active, i.e., training data, gleaned by readily available passive and active means. The system is completely automated and employs a combination of information-theoretic and machine learning techniques to probabilistically model IPv6 addresses. We present results showing that our system is effective in exposing structural characteristics of portions of the active IPv6 Internet address space, populated by clients, services, and routers. In addition to visualizing the address structure for exploration, the system uses its models to generate candidate addresses for scanning. For each of 15 evaluated datasets, we train on 1K addresses and generate 1M candidates for scanning. We achieve some success in 14 datasets, finding up to 40% of the generated addresses to be active. In 11 of these datasets, we find active network identifiers (e.g., /64 prefixes or "subnets") not seen in training. Thus, we provide the first evidence that it is practical to discover subnets and hosts by scanning probabilistically selected areas of the IPv6 address space not known to contain active hosts a priori.