Biblio
The malicious JavaScript is a common springboard for attackers to launch several types of network attacks, such as Drive-by-Download and malicious PDF delivery attack. In order to elude detection of signature matching, malicious JavaScript is often packed (so-called "obfuscation") with diversified algorithms therefore the occurrence of obfuscation is always a good pointer for potential maliciousness. In this investigation, we propose a light weight approach for quickly filtering obfuscated JavaScript by a novel method of tokenizing JavaScript text at letter level and information-theoretic measures, based on the previous work in the domain of detecting obfuscated malicious code as well as the pattern analysis of natural languages. The new approach is apparently time efficient compared to existing systems since it processes much less objects while it is also proved to be able to reach the acceptable detection accuracies.
Outsourcing a huge amount of local data to remote cloud servers that has been become a significant trend for industries. Leveraging the considerable cloud storage space, industries can also put forward the outsourced data to cloud computing. How to collect the data for computing without loss of privacy and confidentiality is one of the crucial security problems. Searchable encryption technique has been proposed to protect the confidentiality of the outsourced data and the privacy of the corresponding data query. This technique, however, only supporting search functionality, may not be fully applicable to real-world cloud computing scenario whereby secure data search, share as well as computation are needed. This work presents a novel encrypted cloud-based data share and search system without loss of user privacy and data confidentiality. The new system enables users to make conjunctive keyword query over encrypted data, but also allows encrypted data to be efficiently and multiply shared among different users without the need of the "download-decrypt-then-encrypt" mode. As of independent interest, our system provides secure keyword update, so that users can freely and securely update data's keyword field. It is worth mentioning that all the above functionalities do not incur any expansion of ciphertext size, namely, the size of ciphertext remains constant during being searched, shared and keyword-updated. The system is proven secure and meanwhile, the efficiency analysis shows its great potential in being used in large-scale database.
Resource discovery in unstructured peer-to-peer networks causes a search query to be flooded throughout the network via random nodes, leading to security and privacy issues. The owner of the search query does not have control over the transmission of its query through the network. Although algorithms have been proposed for policy-compliant query or data routing in a network, these algorithms mainly deal with authentic route computation and do not provide mechanisms to actually verify the network paths taken by the query. In this work, we propose an approach to deal with the problem of verifying network paths taken by a search query during resource discovery, and detection of malicious forwarding of search query. Our approach aims at being secure and yet very scalable, even in the presence of huge number of nodes in the network.
Recommendation systems become popular in our daily life. It is well known that the more the release of users' personal data, the better the quality of recommendation. However, such services raise serious privacy concerns for users. In this paper, focusing on matrix factorization-based recommendation systems, we propose the first privacy-preserving matrix factorization using fully homomorphic encryption. On inputs of encrypted users' ratings, our protocol performs matrix factorization over the encrypted data and returns encrypted outputs so that the recommendation system knows nothing on rating values and resulting user/item profiles. It provides a way to obfuscate the number and list of items a user rated without harming the accuracy of recommendation, and additionally protects recommender's tuning parameters for business benefit and allows the recommender to optimize the parameters for quality of service. To overcome performance degradation caused by the use of fully homomorphic encryption, we introduce a novel data structure to perform computations over encrypted vectors, which are essential operations for matrix factorization, through secure 2-party computation in part. With the data structure, the proposed protocol requires dozens of times less computation cost over those of previous works. Our experiments on a personal computer with 3.4 GHz 6-cores 64 GB RAM show that the proposed protocol runs in 1.5 minutes per iteration. It is more efficient than Nikolaenko et al.'s work proposed in CCS 2013, in which it took about 170 minutes on two servers with 1.9 GHz 16-cores 128 GB RAM.
The massive amount of data that is being collected by today's society has the potential to advance scientific knowledge and boost innovations. However, people often lack sufficient computing resources to analyze their large-scale data in a cost-effective and timely way. Cloud computing offers access to vast computing resources on an on-demand and pay-per-use basis, which is a practical way for people to analyze their huge data sets. However, since their data contain sensitive information that needs to be kept secret for ethical, security, or legal reasons, many people are reluctant to adopt cloud computing. For the first time in the literature, we propose a secure outsourcing algorithm for large-scale quadratic programs (QPs), which is one of the most fundamental problems in data analysis. Specifically, based on simple linear algebra operations, we design a low-complexity QP transformation that protects the private data in a QP. We show that the transformed QP is computationally indistinguishable under a chosen plaintext attack (CPA), i.e., CPA-secure. We then develop a parallel algorithm to solve the transformed QP at the cloud, and efficiently find the solution to the original QP at the user. We implement the proposed algorithm on the Amazon Elastic Compute Cloud (EC2) and a laptop. We find that our proposed algorithm offers significant time savings for the user and is scalable to the size of the QP.
The output of 3D volume segmentation is crucial to a wide range of endeavors. Producing accurate segmentations often proves to be both inefficient and challenging, in part due to lack of imaging data quality (contrast and resolution), and because of ambiguity in the data that can only be resolved with higher-level knowledge of the structure and the context wherein it resides. Automatic and semi-automatic approaches are improving, but in many cases still fail or require substantial manual clean-up or intervention. Expert manual segmentation and review is therefore still the gold standard for many applications. Unfortunately, existing tools (both custom-made and commercial) are often designed based on the underlying algorithm, not the best method for expressing higher-level intention. Our goal is to analyze manual (or semi-automatic) segmentation to gain a better understanding of both low-level (perceptual tasks and actions) and high-level decision making. This can be used to produce segmentation tools that are more accurate, efficient, and easier to use. Questioning or observation alone is insufficient to capture this information, so we utilize a hybrid capture protocol that blends observation, surveys, and eye tracking. We then developed, and validated, data coding schemes capable of discerning low-level actions and overall task structures.
The following decade will witness a surge in remote health-monitoring systems that are based on body-worn monitoring devices. These Medical Cyber Physical Systems (MCPS) will be capable of transmitting the acquired data to a private or public cloud for storage and processing. Machine learning algorithms running in the cloud and processing this data can provide decision support to healthcare professionals. There is no doubt that the security and privacy of the medical data is one of the most important concerns in designing an MCPS. In this paper, we depict the general architecture of an MCPS consisting of four layers: data acquisition, data aggregation, cloud processing, and action. Due to the differences in hardware and communication capabilities of each layer, different encryption schemes must be used to guarantee data privacy within that layer. We survey conventional and emerging encryption schemes based on their ability to provide secure storage, data sharing, and secure computation. Our detailed experimental evaluation of each scheme shows that while the emerging encryption schemes enable exciting new features such as secure sharing and secure computation, they introduce several orders-of-magnitude computational and storage overhead. We conclude our paper by outlining future research directions to improve the usability of the emerging encryption schemes in an MCPS.
The Wikidata platform is a crowdsourced, structured knowledgebase aiming to provide integrated, free and language-agnostic facts which are–-amongst others–-used by Wikipedias. Users who actively enter, review and revise data on Wikidata are assisted by a property suggesting system which provides users with properties that might also be applicable to a given item. We argue that evaluating and subsequently improving this recommendation mechanism and hence, assisting users, can directly contribute to an even more integrated, consistent and extensive knowledge base serving a huge variety of applications. However, the quality and usefulness of such recommendations has not been evaluated yet. In this work, we provide the first evaluation of different approaches aiming to provide users with property recommendations in the process of curating information on Wikidata. We compare the approach currently facilitated on Wikidata with two state-of-the-art recommendation approaches stemming from the field of RDF recommender systems and collaborative information systems. Further, we also evaluate hybrid recommender systems combining these approaches. Our evaluations show that the current recommendation algorithm works well in regards to recall and precision, reaching a recall@7 of 79.71% and a precision@7 of 27.97%. We also find that generally, incorporating contextual as well as classifying information into the computation of property recommendations can further improve its performance significantly.
The state of network security today is quite abysmal. Security breaches and downtime of critical infrastructures continue to be the norm rather than the exception, despite the dramatic rise in spending on network security. Attackers today can easily leverage a distributed and programmable infrastructure of compromised machines (or botnets) to launch large-scale and sophisticated attack campaigns. In contrast, the defenders of our critical infrastructures are fundamentally crippled as they rely on fixed capacity, inflexible, and expensive hardware appliances deployed at designated "chokepoints". These primitive defense capabilities force defenders into adopting weak and static security postures configured for simple and known attacks, or otherwise risk user revolt, as they face unpleasant tradeoffs between false positives and false negatives. Unfortunately, attacks can easily evade these defenses; e.g., piggybacking on popular services (e.g., drive-by-downloads) and by overloading the appliances. Continuing along this trajectory means that attackers will always hold the upper hand as defenders are stifled by the inflexible and impotent tools in their arsenal. An overarching goal of my work is to change the dynamics of this attack-defense equation. Instead of taking a conventional approach of developing attack-specific defenses, I argue that we can leverage recent trends in software-defined networking and network functions virtualization to better empower defenders with the right tools and abstractions to tackle the constantly evolving attack landscape. To this end, I envision a new software-defined approach to network security, where we can rapidly develop and deploy novel in-depth defenses and dynamically customize the network's security posture to the current operating context. In this talk, I will give an overview of our recent work on the basic building blocks to enable this vision as well as some early security capabilities we have developed. Using anecdotes from this specific exercise, I will also try to highlight lessons and experiences in the overall research process (e.g., how to pick and formulate problems, the role of serendipity, and the benefits of finding ``bridges'' to other subdomains).
Providing efficient protection against energy consumption based side channel attacks (SCAs) for block ciphers is a relevant topic for the research community, as current overheads are in the 100x range. Unprofiled SCAs exploit information leakage from the outmost rounds of a cipher; we propose a solution encasing it between keyed transformations amenable to an efficient SCA protection. Our solution can be employed as a drop in replacement for an unprotected implementation, or be retrofit to an existing one, while retaining communication capabilities with legacy insecure endpoints. Experiments on a Cortex-M4 μC, show performance improvements in the range of 60x, compared with available solutions.
The web world is been flooded with multi-media sources such as images, videos, animations and audios, which has in turn made the computer vision researchers to focus over extracting the content from the sources. Scene text recognition basically involves two major steps namely Text Localization and Text Recognition. This paper provides end-to-end text recognition approach to extract the characters alone from the complex natural scene. Using Maximal Stable Extremal Region (MSER) the various objects are localized, using Canny Edge detection method edges are identified, further binary classification is done using Connected-Component method which segregates the text and nontext objects and finally the stroke analysis method is applied to analyse the style of the character, leading to the character recognization. The Experimental results were obtained by testing the approach over ICDAR2015 dataset, wherein text was able to be recognized from most of the scene images with good precision value.
Protecting the confidentiality of information manipulated by a computing system is one of the most important challenges facing today's cybersecurity community. A promising step toward conquering this challenge is to formally verify that the end-to-end behavior of the computing system really satisfies various information-flow policies. Unfortunately, because today's system software still consists of both C and assembly programs, the end-to-end verification necessarily requires that we not only prove the security properties of individual components, but also carefully preserve these properties through compilation and cross-language linking. In this paper, we present a novel methodology for formally verifying end-to-end security of a software system that consists of both C and assembly programs. We introduce a general definition of observation function that unifies the concepts of policy specification, state indistinguishability, and whole-execution behaviors. We show how to use different observation functions for different levels of abstraction, and how to link different security proofs across abstraction levels using a special kind of simulation that is guaranteed to preserve state indistinguishability. To demonstrate the effectiveness of our new methodology, we have successfully constructed an end-to-end security proof, fully formalized in the Coq proof assistant, of a nontrivial operating system kernel (running on an extended CompCert x86 assembly machine model). Some parts of the kernel are written in C and some are written in assembly; we verify all of the code, regardless of language.
Working in ISM band becomes overcrowded, shared unlicensed spectrum band, leads to a reduction in the quality of communication. This makes increase in packet loss caused by collisions and results in the necessity of packets retransmissions. In wireless sensor networks a large amount of energy of sensor nodes will be wasted during retransmissions. Cognitive radio is the technology makes it possible for sensor nodes to make use of licensed bands. In this paper a routing technique for cognitive radio wireless sensor networks is presented, that is based on a cross-layer design that jointly considers route and spectrum selection. This method has two main phases: next hop selection and channel selection. The routing is done hop-by-hop with local information and decisions, which are more compatible with sensor networks. Primary user action and prevention from interfering with them is considered in all spectrum decisions. It uniformly distributes frequency channels between neighboring nodes, which lead to a local reduction in collision probability. This clearly affects energy consumption in all sensor nodes. The route selection is energy-aware and a learning based technique is used to reduce the packet delay with respect to hop-count. The imitation reveals that by applying cognitive radio technology to WSNs and selecting a proper channel, we can consciously decrease collision probability. This saves energy of sensor nodes and improves the network lifetime.
All modern web browsers –- Internet Explorer, Firefox, Chrome, Opera, and Safari –- have a core rendering engine written in C++. This language choice was made because it affords the systems programmer complete control of the underlying hardware features and memory in use, and it provides a transparent compilation model. Unfortunately, this language is complex (especially to new contributors!), challenging to write correct parallel code in, and highly susceptible to memory safety issues that potentially lead to security holes. Servo is a project started at Mozilla Research to build a new web browser engine that preserves the capabilities of these other browser engines but also both takes advantage of the recent trends in parallel hardware and is more memory-safe. We use a new language, Rust, that provides us a similar level of control of the underlying system to C++ but which statically prevents many memory safety issues and provides direct support for parallelism and concurrency. In this paper, we show how a language with an advanced type system can address many of the most common security issues and software engineering challenges in other browser engines, while still producing code that has the same performance and memory profile. This language is also quite accessible to new open source contributors and employees, even those without a background in C++ or systems programming. We also outline several pitfalls encountered along the way and describe some potential areas for future improvement.
Given a history of detected malware attacks, can we predict the number of malware infections in a country? Can we do this for different malware and countries? This is an important question which has numerous implications for cyber security, right from designing better anti-virus software, to designing and implementing targeted patches to more accurately measuring the economic impact of breaches. This problem is compounded by the fact that, as externals, we can only detect a fraction of actual malware infections. In this paper we address this problem using data from Symantec covering more than 1.4 million hosts and 50 malware spread across 2 years and multiple countries. We first carefully design domain-based features from both malware and machine-hosts perspectives. Secondly, inspired by epidemiological and information diffusion models, we design a novel temporal non-linear model for malware spread and detection. Finally we present ESM, an ensemble-based approach which combines both these methods to construct a more accurate algorithm. Using extensive experiments spanning multiple malware and countries, we show that ESM can effectively predict malware infection ratios over time (both the actual number and trend) upto 4 times better compared to several baselines on various metrics. Furthermore, ESM's performance is stable and robust even when the number of detected infections is low.
There has been growing interest in using convolutional neural networks (CNNs) in the fields of image forensics and steganalysis, and some promising results have been reported recently. These works mainly focus on the architectural design of CNNs, usually, a single CNN model is trained and then tested in experiments. It is known that, neural networks, including CNNs, are suitable to form ensembles. From this perspective, in this paper, we employ CNNs as base learners and test several different ensemble strategies. In our study, at first, a recently proposed CNN architecture is adopted to build a group of CNNs, each of them is trained on a random subsample of the training dataset. The output probabilities, or some intermediate feature representations, of each CNN, are then extracted from the original data and pooled together to form new features ready for the second level of classification. To make best use of the trained CNN models, we manage to partially recover the lost information due to spatial subsampling in the pooling layers when forming feature vectors. Performance of the ensemble methods are evaluated on BOSSbase by detecting S-UNIWARD at 0.4 bpp embedding rate. Results have indicated that both the recovery of the lost information, and learning from intermediate representation in CNNs instead of output probabilities, have led to performance improvement.
Heterogeneous face recognition aims to identify or verify person identity by matching facial images of different modalities. In practice, it is known that its performance is highly influenced by modality inconsistency, appearance occlusions, illumination variations and expressions. In this paper, a new method named as ensemble of sparse cross-modal metrics is proposed for tackling these challenging issues. In particular, a weak sparse cross-modal metric learning method is firstly developed to measure distances between samples of two modalities. It learns to adjust rank-one cross-modal metrics to satisfy two sets of triplet based cross-modal distance constraints in a compact form. Meanwhile, a group based feature selection is performed to enforce that features in the same position of two modalities are selected simultaneously. By neglecting features that attribute to "noise" in the face regions (eye glasses, expressions and so on), the performance of learned weak metrics can be markedly improved. Finally, an ensemble framework is incorporated to combine the results of differently learned sparse metrics into a strong one. Extensive experiments on various face datasets demonstrate the benefit of such feature selection especially when heavy occlusions exist. The proposed ensemble metric learning has been shown superiority over several state-of-the-art methods in heterogeneous face recognition.
WebRTC is one of the latest additions to the ever growing repository of Web browser technologies, which push the envelope of native Web application capabilities. WebRTC allows real-time peer-to-peer audio and video chat, that runs purely in the browser. Unlike existing video chat solutions, such as Skype, that operate in a closed identity ecosystem, WebRTC was designed to be highly flexible, especially in the domains of signaling and identity federation. This flexibility, however, opens avenues for identity fraud. In this paper, we explore the technical underpinnings of WebRTC's identity management architecture. Based on this analysis, we identify three novel attacks against endpoint authenticity. To answer the identified threats, we propose and discuss defensive strategies, including security improvements for the WebRTC specifications and mitigation techniques for the identity and service providers.
Contemporary vehicles are getting equipped with an increasing number of Electronic Control Units (ECUs) and wireless connectivities. Although these have enhanced vehicle safety and efficiency, they are accompanied with new vulnerabilities. In this paper, we unveil a new important vulnerability applicable to several in-vehicle networks including Control Area Network (CAN), the de facto standard in-vehicle network protocol. Specifically, we propose a new type of Denial-of-Service (DoS), called the bus-off attack, which exploits the error-handling scheme of in-vehicle networks to disconnect or shut down good/uncompromised ECUs. This is an important attack that must be thwarted, since the attack, once an ECU is compromised, is easy to be mounted on safety-critical ECUs while its prevention is very difficult. In addition to the discovery of this new vulnerability, we analyze its feasibility using actual in-vehicle network traffic, and demonstrate the attack on a CAN bus prototype as well as on two real vehicles. Based on our analysis and experimental results, we also propose and evaluate a mechanism to detect and prevent the bus-off attack.
Quality assurance for highly-configurable systems is challenging due to the exponentially growing configuration space. Interactions among multiple options can lead to surprising behaviors, bugs, and security vulnerabilities. Analyzing all configurations systematically might be possible though if most options do not interact or interactions follow specific patterns that can be exploited by analysis tools. To better understand interactions in practice, we analyze program traces to characterize and identify where interactions occur on control flow and data. To this end, we developed a dynamic analysis for Java based on variability-aware execution and monitor executions of multiple small to medium-sized programs. We find that the essential configuration complexity of these programs is indeed much lower than the combinatorial explosion of the configuration space indicates. However, we also discover that the interaction characteristics that allow scalable and complete analyses are more nuanced than what is exploited by existing state-of-the-art quality assurance strategies.
To help establish a more scientific basis for security science, which will enable the development of fundamental theories and move the field from being primarily reactive to primarily proactive, it is important for research results to be reported in a scientifically rigorous manner. Such reporting will allow for the standard pillars of science, namely replication, meta-analysis, and theory building. In this paper we aim to establish a baseline of the state of scientific work in security through the analysis of indicators of scientific research as reported in the papers from the 2015 IEEE Symposium on Security and Privacy. To conduct this analysis, we developed a series of rubrics to determine the completeness of the papers relative to the type of evaluation used (e.g. case study, experiment, proof). Our findings showed that while papers are generally easy to read, they often do not explicitly document some key information like the research objectives, the process for choosing the cases to include in the studies, and the threats to validity. We hope that this initial analysis will serve as a baseline against which we can measure the advancement of the science of security.
To help establish a more scientific basis for security science, which will enable the development of fundamental theories and move the field from being primarily reactive to primarily proactive, it is important for research results to be reported in a scientifically rigorous manner. Such reporting will allow for the standard pillars of science, namely replication, meta-analysis, and theory building. In this paper we aim to establish a baseline of the state of scientific work in security through the analysis of indicators of scientific research as reported in the papers from the 2015 IEEE Symposium on Security and Privacy. To conduct this analysis, we developed a series of rubrics to determine the completeness of the papers relative to the type of evaluation used (e.g. case study, experiment, proof). Our findings showed that while papers are generally easy to read, they often do not explicitly document some key information like the research objectives, the process for choosing the cases to include in the studies, and the threats to validity. We hope that this initial analysis will serve as a baseline against which we can measure the advancement of the science of security.