Biblio
The capability to reliably and accurately identify the attacker has long been believed as one of the most effective deterrents to an attack. Ideally, the attribution of cyber attack should be automated from the attack target all the way toward the attack source on the Internet in real-time. Real-time, network-wide attack attribution, however, is every challenging, and many people have doubted whether it is feasible to have practical attack attribution on the Internet. In this paper, we look into the problem, challenges of real-time attack attribution on the Internet, and analyze what it takes to have the real-time attack attribution on the Internet. We show that it is indeed feasible and practical to attribute certain cyber attacks on the Internet in real-time. We build such a real-time attack attribution system upon the malware immunization and packet flow watermarking techniques we have developed. We demonstrate the unprecedented real-time attack attribution capability via live experiments on the Internet and Tor nodes all over the world.
Authorship attribution is a classical classification problem. We use it here to illustrate the performance of a compression-based measure that relies on the notion of relative compression. Besides comparing with recent approaches that use multiple discriminant analysis and support vector machines, we compare it with the Normalized Conditional Compression Distance (a direct approximation of the Normalized Information Distance) and the popular Normalized Compression Distance. The Normalized Relative Compression (NRC) attained 100% correct classification in the data set used, showing consistency between the compression ratio and the classification performance, a characteristic not always present in other compression-based measures.
Authorship attribution is a stylometric technique that associates text to authors based on the type of writing styles. Researchers have looked for ways to analyze the context of these texts, in some cases with limited results. Most of the approaches view information at the syntactic and physical levels and tend to ignore information from the semantic levels. In this paper, we present a technique that incorporates the use of semantic frames as a method for authorship attribution. We hypothesize that it provides a deeper view into the semantic level of texts, which is an influencing factor in a writer's style. We use a variety of online resources in a pipeline fashion to extract information about frames within the text. The results show that our “bag of frames” approach can be used successfully for stylometry.
The Google Identity Platform is a system that allows a user to sign in to applications and other services by using a Google account. Google Sign-In is one such method for providing one’s identity to the Google Identity Platform. Google Sign-In is available for Android applications and iOS applications, as well as for websites and other devices. Users of Google Sign-In find that it integrates well with the Android platform, but iOS users (iPhone, iPad, etc.) do not have the same experience. The user experience when logging in to a Google account on an iOS application can not only be more tedious than the Android experience, but it also conditions users to engage in behaviors that put the information in their Google accounts at risk.
Searchable symmetric encryption (SSE) enables a client to store a database on an untrusted server while supporting keyword search in a secure manner. Despite the rapidly increasing interest in SSE technology, experiments indicate that the performance of the known schemes scales badly to large databases. Somewhat surprisingly, this is not due to their usage of cryptographic tools, but rather due to their poor locality (where locality is defined as the number of non-contiguous memory locations the server accesses with each query). The only known schemes that do not suffer from poor locality suffer either from an impractical space overhead or from an impractical read efficiency (where read efficiency is defined as the ratio between the number of bits the server reads with each query and the actual size of the answer). We construct the first SSE schemes that simultaneously enjoy optimal locality, optimal space overhead, and nearly-optimal read efficiency. Specifically, for a database of size N, under the modest assumption that no keyword appears in more than N1 − 1/loglogN documents, we construct a scheme with read efficiency Õ(loglogN). This essentially matches the lower bound of Cash and Tessaro (EUROCRYPT ’14) showing that any SSE scheme must be sub-optimal in either its locality, its space overhead, or its read efficiency. In addition, even without making any assumptions on the structure of the database, we construct a scheme with read efficiency Õ(logN). Our schemes are obtained via a two-dimensional generalization of the classic balanced allocations (“balls and bins”) problem that we put forward. We construct nearly-optimal two-dimensional balanced allocation schemes, and then combine their algorithmic structure with subtle cryptographic techniques.
The safety-critical aspects of cyber-physical systems motivate the need for rigorous analysis of these systems. In the literature this work is often done using idealized models of systems where the analysis can be carried out using high-level reasoning techniques such as Lyapunov functions and model checking. In this paper we present VERIDRONE, a foundational framework for reasoning about cyber-physical systems at all levels from high-level models to C code that implements the system. VERIDRONE is a library within the Coq proof assistant enabling us to build on its foundational implementation, its interactive development environments, and its wealth of libraries capturing interesting theories ranging from real numbers and differential equations to verified compilers and floating point numbers. These features make proof assistants in general, and Coq in particular, a powerful platform for unifying foundational results about safety-critical systems and ensuring interesting properties at all levels of the stack.
Physical Unclonable Functions (PUFs) measure manufacturing variations inside integrated circuits to derive internal secrets during run-time and avoid to store secrets permanently in non-volatile memory. PUF responses are noisy such that they require error correction to generate reliable cryptographic keys. To date, when needed one single key is reproduced in the field and always used, regardless of its reliability. In this work, we compute online reliability information for a reproduced key and perform multiple PUF readout and error correction steps in case of an unreliable result. This permits to choose the most reliable key among multiple derived key candidates with different corrected error patterns. We achieve the same average key error probability from less PUF response bits with this approach. Our proof of concept design for a popular reference scenario uses Differential Sequence Coding (DSC) and a Viterbi decoder with reliability output information. It requires 39% less PUF response bits and 16% less helper data bits than the regular approach without the option for multiple readouts.
It is a fundamental problem to decide how many copies of an unknown mixed quantum state are necessary and sufficient to determine the state. This is the quantum analogue of the problem of estimating a probability distribution given some number of samples. Previously, it was known only that estimating states to error є in trace distance required O(dr2/є2) copies for a d-dimensional density matrix of rank r. Here, we give a measurement scheme (POVM) that uses O( (dr/ δ ) ln(d/δ) ) copies to estimate ρ to error δ in infidelity. This implies O( (dr / є2)· ln(d/є) ) copies suffice to achieve error є in trace distance. For fixed d, our measurement can be implemented on a quantum computer in time polynomial in n. We also use the Holevo bound from quantum information theory to prove a lower bound of Ω(dr/є2)/ log(d/rє) copies needed to achieve error є in trace distance. This implies a lower bound Ω(dr/δ)/log(d/rδ) for the estimation error δ in infidelity. These match our upper bounds up to log factors. Our techniques can also show an Ω(r2d/δ) lower bound for measurement strategies in which each copy is measured individually and then the outcomes are classically post-processed to produce an estimate. This matches the known achievability results and proves for the first time that such “product” measurements have asymptotically suboptimal scaling with d and r.
Security in Mobile Ad Hoc networks is still ongoing research in the scientific community and it is difficult bring an overall security solution. In this paper we assess feasibility of distributed firewall solutions in the Mobile Ad Hoc Networks. Attention is also focused on different security solutions in the Ad Hoc networks. We propose a security architecture which secures network on the several layers and is the most secured solution out of analyzed materials. For this purpose we use distributed public key infrastructure, distributed firewall and intrusion detection system. Our architecture is using both symmetric and asymmetric cryptography and in this paper we present performance measurements and the security analysis of our solution.
The success or failure of a mobile application (`app') is largely determined by user ratings. Users frequently make their app choices based on the ratings of apps in comparison with similar, often competing apps. Users also expect apps to continually provide new features while maintaining quality, or the ratings drop. At the same time apps must also be secure, but is there a historical trade-off between security and ratings? Or are app store ratings a more all-encompassing measure of product maturity? We used static analysis tools to collect security-related metrics in 38,466 Android apps from the Google Play store. We compared the rate of an app's permission misuse, number of requested permissions, and Androrisk score, against its user rating. We found that high-rated apps have statistically significantly higher security risk metrics than low-rated apps. However, the correlations are weak. This result supports the conventional wisdom that users are not factoring security risks into their ratings in a meaningful way. This could be due to several reasons including users not placing much emphasis on security, or that the typical user is unable to gauge the security risk level of the apps they use everyday.
Smart devices from smartphones to wearable computers today have been used in many purposes. These devices run various mobile operating systems like Android, iOS, Symbian, Windows Mobile, etc. Since the mobile devices are widely used and contain personal information, they are subject to security attacks by mobile malware applications. In this work we propose a new approach based on control flow graphs and machine learning algorithms for static Android malware analysis. Experimental results have shown that the proposed approach achieves a high classification accuracy of 96.26% in general and high detection rate of 99.15% for DroidKungfu malware families which are very harmful and difficult to detect because of encrypting the root exploits, by reducing data dimension significantly for real time analysis.
Biometric verification systems have security issues regarding the storage of biometric data in that a user's biometric features cannot be changed into other ones even when a system is compromised. To address this issue, it may be safe to store the biometrics data on a reliable remote server instead of storing them in a local device. However, this approach may raise a privacy issue. In this paper, we propose a biometric verification system where the biometric data are stored in a remote server in an encrypted form and the similarity of the user input to the registered biometric data is computed in an encrypted domain using a homomorphic encryption. We evaluated the performance of the proposed system through an implementation on an Android-based smartphone and an i7-based server.
We present a novel type of Trojan trigger targeted at the field-programmable gate array (FPGA) design flow. Traditional triggers base on rare events, such as rare values or sequences. While in most cases these trigger circuits are able to hide a Trojan attack, exhaustive functional simulation and testing will reveal the Trojan due to violation of the specification. Our trigger behaves functionally and formally equivalent to the hardware description language (HDL) specification throughout the entire FPGA design flow, until the design is written by the place-and-route tool as bitstream configuration file . From then, Trojan payload is always on. We implement the trigger signal using a 4-input lookup table (LUT), each of the inputs connecting to the same signal. This lets us directly address the least significant bit (LSB) and most significant bit (MSB) of the LUT. With the remaining 14 bits, we realize a "magic" unary operation. This way, we are able to implement 16 different Triggers. We demonstrate the attack with a simple example and discuss the effectiveness of the recent detection techniques unused circuit identification (UCI), functional analysis for nearly-unused circuit identification (FANCI) and VeriTrust in order to reveal our trigger.
The software-as-a-service (SaaS) market is growing very fast, but still many clients are concerned about the confidentiality of their data in the cloud. Motivated hackers or malicious insiders could try to steal the clients' data. Encryption is a potential solution, but supporting the necessary functionality also in existing applications is difficult. In this paper, we examine encrypting analytical web applications that perform extensive number processing operations in the database. Existing solutions for encrypting data in web applications poorly support such encryption. We employ a proxy that adjusts the encryption to the level necessary for the client's usage and also supports additively homomorphic encryption. This proxy is deployed at the client and all encryption keys are stored and managed there, while the application is running in the cloud. Our proxy is stateless and we only need to modify the database driver of the application. We evaluate an instantiation of our architecture on an exemplary application. We only slightly increase page load time on average from 3.1 seconds to 4.7. However, roughly 40% of all data columns remain probabilistic encrypted. The client can set the desired security level for each column using our policy mechanism. Hence our proxy architecture offers a solution to increase the confidentiality of the data at the cloud provider at a moderate performance penalty.
Android malware is becoming very effective in evading detection techniques, and traditional malware detection techniques are demonstrating their weaknesses. Signature based detection shows at least two drawbacks: first, the detection is possible only after the malware has been identified, and the time needed to produce and distribute the signature provides attackers with window of opportunities for spreading the malware in the wild. For solving this problem, different approaches that try to characterize the malicious behavior through the invoked system and API calls emerged. Unfortunately, several evasion techniques have proven effective to evade detection based on system and API calls. In this paper, we propose an approach for capturing the malicious behavior in terms of device resource consumption (using a thorough set of features), which is much more difficult to camouflage. We describe a procedure, and the corresponding practical setting, for extracting those features with the aim of maximizing their discriminative power. Finally, we describe the promising results we obtained experimenting on more than 2000 applications, on which our approach exhibited an accuracy greater than 99%.
The concept of digital right management (DRM) has become extremely important in current mobile environments. This paper shows how partial bitstream encryption can allow the secure distribution of hardware applications resembling the mechanisms of traditional software DRM. Building on the recent developments towards the secure distribution of hardware cores, the paper demonstrates a prototypical implementation of a user mobile device supporting such distribution mechanisms. The prototype extends the Android operating system with support for hardware reconfigurability and showcases the interplay of novel security concepts enabled by hardware DRM, the advantages of a design flow based on high-level synthesis, and the opportunities provided by current software-rich reconfigurable Systems-on-Chips. Relying on this prototype, we also collected extensive quantitative results demonstrating the limited overhead incurred by the secure distribution architecture.
Game theory is appropriate for studying cyber conflict because it allows for an intelligent and goal-driven adversary. Applications of game theory have led to a number of results regarding optimal attack and defense strategies. However, the overwhelming majority of applications explore overly simplistic games, often ones in which each participant's actions are visible to every other participant. These simplifications strip away the fundamental properties of real cyber conflicts: probabilistic alerting, hidden actions, unknown opponent capabilities. In this paper, we demonstrate that it is possible to analyze a more realistic game, one in which different resources have different weaknesses, players have different exploits, and moves occur in secrecy, but they can be detected. Certainly, more advanced and complex games are possible, but the game presented here is more realistic than any other game we know of in the scientific literature. While optimal strategies can be found for simpler games using calculus, case-by-case analysis, or, for stochastic games, Q-learning, our more complex game is more naturally analyzed using the same methods used to study other complex games, such as checkers and chess. We define a simple evaluation function and employ multi-step searches to create strategies. We show that such scenarios can be analyzed, and find that in cases of extreme uncertainty, it is often better to ignore one's opponent's possible moves. Furthermore, we show that a simple evaluation function in a complex game can lead to interesting and nuanced strategies that follow tactics that tend to select moves that are well tuned to the details of the situation and the relative probabilities of success.
To help establish a more scientific basis for security science, which will enable the development of fundamental theories and move the field from being primarily reactive to primarily proactive, it is important for research results to be reported in a scientifically rigorous manner. Such reporting will allow for the standard pillars of science, namely replication, meta-analysis, and theory building. In this paper we aim to establish a baseline of the state of scientific work in security through the analysis of indicators of scientific research as reported in the papers from the 2015 IEEE Symposium on Security and Privacy. To conduct this analysis, we developed a series of rubrics to determine the completeness of the papers relative to the type of evaluation used (e.g. case study, experiment, proof). Our findings showed that while papers are generally easy to read, they often do not explicitly document some key information like the research objectives, the process for choosing the cases to include in the studies, and the threats to validity. We hope that this initial analysis will serve as a baseline against which we can measure the advancement of the science of security.
This paper concerns the role of human errors in the field of Early Risk assessment in Software Project Management. Researchers have recently begun to focus on human errors in early risk assessment in large software projects; statistics show it to be major components of problems in software over 80% of economic losses are attributed to this problem. There has been comparatively diminutive experimental research on the role of human errors in this context, particularly evident at the organizational level, largely because of reluctance to share information and statistics on security issues in online software application. Grounded theory has been employed to investigate the main root of human errors in online security risks as a research methodology. An open-ended question was asked of 103 information security experts around the globe and the responses used to develop a list of human errors causes by open coding. The paper represents a contribution to our understanding of the causes of human errors in information security contexts. It is also one of the first information security research studies of the kind utilizing Strauss and Glaser's grounded theory approaches together, during data collection phases to achieve the required number of participants' responses and is a significant contribution to the field.
It is a fundamental problem to decide how many copies of an unknown mixed quantum state are necessary and sufficient to determine the state. This is the quantum analogue of the problem of estimating a probability distribution given some number of samples. Previously, it was known only that estimating states to error є in trace distance required O(dr2/є2) copies for a d-dimensional density matrix of rank r. Here, we give a measurement scheme (POVM) that uses O( (dr/ δ ) ln(d/δ) ) copies to estimate ρ to error δ in infidelity. This implies O( (dr / є2)· ln(d/є) ) copies suffice to achieve error є in trace distance. For fixed d, our measurement can be implemented on a quantum computer in time polynomial in n. We also use the Holevo bound from quantum information theory to prove a lower bound of Ω(dr/є2)/ log(d/rє) copies needed to achieve error є in trace distance. This implies a lower bound Ω(dr/δ)/log(d/rδ) for the estimation error δ in infidelity. These match our upper bounds up to log factors. Our techniques can also show an Ω(r2d/δ) lower bound for measurement strategies in which each copy is measured individually and then the outcomes are classically post-processed to produce an estimate. This matches the known achievability results and proves for the first time that such “product” measurements have asymptotically suboptimal scaling with d and r.
Information security management is time-consuming and error-prone. Apart from day-to-day operations, organizations need to comply with industrial regulations or government directives. Thus, organizations are looking for security tools to automate security management tasks and daily operations. Security Content Automation Protocol (SCAP) is a suite of specifications that help to automate security management tasks such as vulnerability measurement and policy compliance evaluation. SCAP benchmark provides detailed guidance on setting the security configuration of network devices, operating systems, and applications. Organizations can use SCAP benchmark to perform automated configuration compliance assessment on network devices, operating systems, and applications. This paper discusses SCAP benchmark components and the development of a SCAP benchmark for automating Cisco router security configuration compliance.
We summarize the optimization and performance evaluation of the Nonhydrostatic ICosahedral Atmospheric Model (NICAM) on two different types of supercomputers: the K computer and TSUBAME2.5. First, we evaluated and improved several kernels extracted from the model code on the K computer. We did not significantly change the loop and data ordering for sufficient usage of the features of the K computer, such as the hardware-aided thread barrier mechanism and the relatively high bandwidth of the memory, i.e., a 0.5 Byte/FLOP ratio. Loop optimizations and code cleaning for a reduction in memory transfer contributed to a speed-up of the model execution time. The sustained performance ratio of the main loop of the NICAM reached 0.87 PFLOPS with 81,920 nodes on the K computer. For GPU-based calculations, we applied OpenACC to the dynamical core of NICAM. The performance and scalability were evaluated using the TSUBAME2.5 supercomputer. We achieved good performance results, which showed efficient use of the memory throughput performance of the GPU as well as good weak scalability. A dry dynamical core experiment was carried out using 2560 GPUs, which achieved 60 TFLOPS of sustained performance.
This paper focuses on a practically very important problem of matching a real-world product photo to exactly the same item(s) in online shopping sites. The task is extremely challenging because the user photos (i.e., the queries in this scenario) are often captured in uncontrolled environments, while the product images in online shops are mostly taken by professionals with clean backgrounds and perfect lighting conditions. To tackle the problem, we study deep network architectures and training schemes, with the goal of learning a robust deep feature representation that is able to bridge the domain gap between the user photos and the online product images. Our contributions are two-fold. First, we propose an alternative of the popular contrastive loss used in siamese deep networks, namely robust contrastive loss, where we "relax" the penalty on positive pairs to alleviate over-fitting. Second, a multi-task fine-tuning approach is introduced to learn a better feature representation, which not only incorporates knowledge from the provided training photo pairs, but also explores additional information from the large ImageNet dataset to regularize the fine-tuning procedure. Experiments on two challenging real-world datasets demonstrate that both the robust contrastive loss and the multi-task fine-tuning approach are effective, leading to very promising results with a time cost suitable for real-time retrieval.
Compared to conventional general-purpose processors, accelerator-rich architectures (ARAs) can provide orders-of-magnitude performance and energy gains. In this paper we design and implement the ARAPrototyper to enable rapid design space explorations for ARAs in real silicons and reduce the tedious prototyping efforts. First, ARAPrototyper provides a reusable baseline prototype with a highly customizable memory system, including interconnect between accelerators and buffers, interconnect between buffers and last-level cache (LLC) or DRAM, coherency choice at LLC or DRAM, and address translation support. To provide more insights into performance analysis, ARAPrototyper adds several performance counters on the accelerator side and leverages existing performance counters on the CPU side. Second, ARAPrototyper provides a clean interface to quickly integrate a user?s own accelerators written in high-level synthesis (HLS) code. Then, an ARA prototype can be automatically generated and mapped to a Xilinx Zynq SoC. To quickly develop applications that run seamlessly on the ARA prototype, ARAPrototyper provides a system software stack and abstracts the accelerators as software libraries for application developers. Our results demonstrate that ARAPrototyper enables a wide range of design space explorations for ARAs at manageable prototyping efforts and 4,000 to 10,000X faster evaluation time than full-system simulations. We believe that ARAPrototyper can be an attractive alternative for ARA design and evaluation.
Automatic detection of TV advertisements is of paramount importance for various media monitoring agencies. Existing works in this domain have mostly focused on news channels using news specific features. Most commercial products use near copy detection algorithms instead of generic advertisement classification. A generic detector needs to handle inter-class and intra-class imbalances present in data due to variability in content aired across channels and frequent repetition of advertisements. Imbalances present in data make classifiers biased towards one of the classes and thus require special treatment. We propose to use tree of perceptrons to solve this problem. The training data available for each perceptron node is balanced using cluster based over-sampling and TOMEK link cleaning as we traverse the tree downwards. The trained perceptron node then passes the original unbalanced data to its children. This process is repeated recursively till we reach the leaf nodes. We call this new algorithm as "Progressively Balanced Perceptron Tree". We have also contributed a TV advertisements dataset consisting of 250 hours of videos recorded from five non-news TV channels of different genres. Experimentations on this dataset have shown that the proposed approach has comparatively superior and balanced performance with respect to six baseline methods. Our proposal generalizes well across channels, with varying training data sizes and achieved a top F1-score of 97% in detecting advertisements.