Biblio
Opportunistic Situation Identification (OSI) is new paradigms for situation-aware systems, in which contexts for situation identification are sensed through sensors that happen to be available rather than pre-deployed and application-specific ones. OSI extends the application usage scale and reduces system costs. However, designing and implementing OSI module of situation-aware systems encounters several challenges, including the uncertainty of context availability, vulnerable network connectivity and privacy threat. This paper proposes a novel middleware framework to tackle such challenges, and its intuition is that it facilitates performing the situation reasoning locally on a smartphone without needing to rely on the cloud, thus reducing the dependency on the network and being more privacy-preserving. To realize such intuitions, we propose a hybrid learning approach to maximize the reasoning accuracy using limited phone's storage space, with the combination of two the-state-the-art techniques. Specifically, this paper provides a genetic algorithm based optimization approach to determine which pre-computed models will be selected for storage under the storage constraints. Validation of the approach based on an open dataset indicates that the proposed approach achieves higher accuracy with comparatively small storage cost. Further, the proposed utility function for model selection performs better than three baseline utility functions.
In this work, we address the problem of designing and implementing honeypots for Industrial Control Systems (ICS). Honeypots are vulnerable systems that are set up with the intent to be probed and compromised by attackers. Analysis of those attacks then allows the defender to learn about novel attacks and general strategy of the attacker. Honeypots for ICS systems need to satisfy both traditional ICT requirements, such as cost and maintainability, and more specific ICS requirements, such as time and determinism. We propose the design of a virtual, high-interaction and server-based ICS honeypot to satisfy the requirements, and the deployment of a realistic, cost-effective, and maintainable ICS honeypot. An attacker model is introduced to complete the problem statement and requirements. Based on our design and the MiniCPS framework, we implemented a honeypot mimicking a water treatment testbed. To the best of our knowledge, the presented honeypot implementation is the first academic work targeting Ethernet/IP based ICS honeypots, the first ICS virtual honeypot that is high-interactive without the use of full virtualization technologies (such as a network of virtual machines), and the first ICS honeypot that can be managed with a Software-Defined Network (SDN) controller.
Defect-prediction techniques can enhance the quality assurance activities for software systems. For instance, they can be used to predict bugs in source files or functions. In the context of a software product line, such techniques could ideally be used for predicting defects in features or combinations of features, which would allow developers to focus quality assurance on the error-prone ones. In this preliminary case study, we investigate how defect prediction models can be used to identify defective features using machine-learning techniques. We adapt process metrics and evaluate and compare three classifiers using an open-source product line. Our results show that the technique can be effective. Our best scenario achieves an accuracy of 73 % for accurately predicting features as defective or clean using a Naive Bayes classifier. Based on the results we discuss directions for future work.
Testing a software product line such as Linux implies building the source with different configurations. Manual approaches to generate configurations that enable code of interest are doomed to fail due to the high amount of variation points distributed over the feature model, the build system and the source code. Research has proposed various approaches to generate covering configurations, but the algorithms show many drawbacks related to run-time, exhaustiveness and the amount of generated configurations. Hence, analyzing an entire Linux source can yield more than 30 thousand configurations and thereby exceeds the limited budget and resources for build testing. In this paper, we present an approach to fill the gap between a systematic generation of configurations and the necessity to fully build software in order to test it. By merging previously generated configurations, we reduce the number of necessary builds and enable global variability-aware testing. We reduce the problem of merging configurations to finding maximum cliques in a graph. We evaluate the approach on the Linux kernel, compare the results to common practices in industry, and show that our implementation scales even when facing graphs with millions of edges.
The threat of DDOS and other cyberattacks has increased during the last decade. In addition to the radical increase in the number of attacks, they are also becoming more sophisticated with the targets ranging from ordinary users to service providers and even critical infrastructure. According to some resources, the sophistication of attacks is increasing faster than the mitigating actions against them. For example determining the location of the attack origin is becoming impossible as cyber attackers employ specific means to evade detection of the attack origin by default, such as using proxy services and source address spoofing. The purpose of this paper is to initiate discussion about effective Internet Protocol traceback mechanisms that are needed to overcome this problem. We propose an approach for traceback that is based on extensive use of security metrics before (proactive) and during (reactive) the attacks.
A yet-to-be-solved but very vital problem in forensics analysis is accurate memory dump data type reverse engineering where the target process is not a priori specified and could be any of the running processes within the system. We present ReViver, a lightweight system-wide solution that extracts data type information from the memory dump without its past execution traces. ReViver constructs the dump's accurate data structure layout through collection of statistical information about possible past traces, forensics inspection of the present memory dump, and speculative investigation of potential future executions of the suspended process. First, ReViver analyzes a heavily instrumented set of execution paths of the same executable that end in the same state of the memory dump (the eip and call stack), and collects statistical information the potential data structure instances on the captured dump. Second, ReViver uses the statistical information and performs a word-byword data type forensics inspection of the captured memory dump. Finally, ReViver revives the dump's execution and explores its potential future execution paths symbolically. ReViver traces the executions including library/system calls for their known argument/return data types, and performs backward taint analysis to mark the dump bytes with relevant data type information. ReViver's experimental results on real-world applications are very promising (98.1%), and show that ReViver improves the accuracy of the past trace-free memory forensics solutions significantly while maintaining a negligible runtime performance overhead (1.8%).
Behavior-based tracking is an unobtrusive technique that allows observers to monitor user activities on the Internet over long periods of time – in spite of changing IP addresses. Previous work has employed supervised classifiers in order to link the sessions of individual users. However, classifiers need labeled training sessions, which are difficult to obtain for observers. In this paper we show how this limitation can be overcome with an unsupervised learning technique. We present a modified k-means algorithm and evaluate it on a realistic dataset that contains the Domain Name System (DNS) queries of 3,862 users. For this purpose, we simulate an observer that tries to track all users, and an Internet Service Provider that assigns a different IP address to every user on every day. The highest tracking accuracy is achieved within the subgroup of highly active users. Almost all sessions of 73% of the users in this subgroup can be linked over a period of 56 days. 19% of the highly active users can be traced completely, i.e., all their sessions are assigned to a single cluster. This fraction increases to 40% for shorter periods of seven days. As service providers may engage in behavior-based tracking to complement their existing profiling efforts, it constitutes a severe privacy threat for users of online services. Users can defend against behavior-based tracking by changing their IP address frequently, but this is cumbersome at the moment.
The specification and implementation of network protocols are difficult tasks. This is particularly the case for WSN nodes which are typically implemented on low power microcontrollers with limited processing capabilities. Those platforms do not have the resources to run a full-fledged operating system but instead are programmed using a Real Time Operating System (RTOS) specialized in low-power wireless communications. The most popular are Contiki and TinyOS. Those RTOS support a fully-compliant IPv6 stack including the 6LoWPAN adaptation layer [1], several radio duty-cycling MAC protocols [2], [3] and multiple routing protocols [4].
Firewall policies are notorious for having misconfiguration errors which can defeat its intended purpose of protecting hosts in the network from malicious users. We believe this is because today's firewall policies are mostly monolithic. Inspired by ideas from modular programming and code refactoring, in this work we introduce three kinds of modules: primary, auxiliary, and template, which facilitate the refactoring of a firewall policy into smaller, reusable, comprehensible, and more manageable components. We present algorithms for generating each of the three modules for a given legacy firewall policy. We also develop ModFP, an automated tool for converting legacy firewall policies represented in access control list to their modularized format. With the help of ModFP, when examining several real-world policies with sizes ranging from dozens to hundreds of rules, we were able to identify subtle errors.
The threat from insiders is an ever-growing concern for organisations, and in recent years the harm that insiders pose has been widely demonstrated. This paper describes our recent work into how we might support insider threat detection when actions are taken which can be immediately determined as of concern because they fall into one of two categories: they violate a policy which is specifically crafted to describe behaviours that are highly likely to be of concern if they are exhibited, or they exhibit behaviours which follow a pattern of a known insider threat attack. In particular, we view these concerning actions as something that we can design and implement tripwires within a system to detect. We then orchestrate these tripwires in conjunction with an anomaly detection system and present an approach to formalising tripwires of both categories. Our intention being that by having a single framework for describing them, alongside a library of existing tripwires in use, we can provide the community of practitioners and researchers with the basis to document and evolve this common understanding of tripwires.
Malicious I/O devices might compromise the OS using DMAs. The OS therefore utilizes the IOMMU to map and unmap every target buffer right before and after its DMA is processed, thereby restricting DMAs to their designated locations. This usage model, however, is not truly secure for two reasons: (1) it provides protection at page granularity only, whereas DMA buffers can reside on the same page as other data; and (2) it delays DMA buffer unmaps due to performance considerations, creating a vulnerability window in which devices can access in-use memory. We propose that OSes utilize the IOMMU differently, in a manner that eliminates these two flaws. Our new usage model restricts device access to a set of shadow DMA buffers that are never unmapped, and it copies DMAed data to/from these buffers, thus providing sub-page protection while eliminating the aforementioned vulnerability window. Our key insight is that the cost of interacting with, and synchronizing access to the slow IOMMU hardware–-required for zero-copy protection against devices–-make copying preferable to zero-copying. We implement our model in Linux and evaluate it with standard networking benchmarks utilizing a 40,Gb/s NIC. We demonstrate that despite being more secure than the safest preexisting usage model, our approach provides up to 5x higher throughput. Additionally, whereas it is inherently less scalable than an IOMMU-less (unprotected) system, our approach incurs only 0%–25% performance degradation in comparison.
Trust is an important facilitator for successful business relationships and an important technology adoption determinant. However, thus far trust has received little attention in the context of cloud computing, resulting in a lack of understanding of the dimensions of trust in cloud services and trust-building antecedents. Although the literature provides various conceptual models of trust for contexts related to cloud computing that may serve as a reference, in particular trust in IT outsourcing providers and trust in IT artifacts, idiosyncrasies of trust in cloud computing require a novel conceptual model of trust. First, a cloud service has a dual nature of being an IT artifact and a service provided by an organization. Second, cloud services are offered in impersonal cloud marketplaces and build upon a nested network of cloud services within the cloud ecosystem. In this article, we first analyze the concept of trust in cloud contexts. Next, we develop a conceptual model that describes trust in cloud services. The conceptual model incorporates the duality of trust in a cloud provider organization and trust in an IT artifact, as well as trust types for the impersonal environment and the cloud computing ecosystem. Using the conceptual model as a lens we then review 43 empirical studies on trust in IT outsourcing and trust in IT artifacts that were identified by a structured literature search. The resulting conceptual model provides a conceptual typology of constructs for trust in cloud services, defines trust-building antecedents, and develops 19 propositions describing the relationships between trust constructs and between trust constructs and trust-building antecedents. The conceptual model contributes to research by creating grounds for future theory-building on trust in cloud contexts, integrating two previously disjoint strands in the trust literature, and identifying knowledge gaps. Based on the conceptual model, we furthermore provide practical advice for managers from service providers, platform providers, customers, and institutional authorities.
Ethernet technology dominates enterprise and home network installations and is present in datacenters as well as parts of the backbone of the Internet. Due to its wireline nature, Ethernet networks are often assumed to intrinsically protect the exchanged data against attacks carried out by eavesdroppers and malicious attackers that do not have physical access to network devices, patch panels and network outlets. In this work, we practically evaluate the possibility of wireless attacks against wired Ethernet installations with respect to resistance against eavesdropping by using off-the-shelf software-defined radio platforms. Our results clearly indicate that twisted-pair network cables radiate enough electromagnetic waves to reconstruct transmitted frames with negligible bit error rates, even when the cables are not damaged at all. Since this allows an attacker to stay undetected, it urges the need for link layer encryption or physical layer security to protect confidentiality.
H.264/SVC (Scalable Video Coding) codestreams, which consist of a single base layer and multiple enhancement layers, are designed for quality, spatial, and temporal scalabilities. They can be transmitted over networks of different bandwidths and seamlessly accessed by various terminal devices. With a huge amount of video surveillance and various devices becoming an integral part of the security infrastructure, the industry is currently starting to use the SVC standard to process digital video for surveillance applications such that clients with different network bandwidth connections and display capabilities can seamlessly access various SVC surveillance (sub)codestreams. In order to guarantee the trustworthiness and integrity of received SVC codestreams, engineers and researchers have proposed several authentication schemes to protect video data. However, existing algorithms cannot simultaneously satisfy both efficiency and robustness for SVC surveillance codestreams. Hence, in this article, a highly efficient and robust authentication scheme, named TrustSSV (Trust Scalable Surveillance Video), is proposed. Based on quality/spatial scalable characteristics of SVC codestreams, TrustSSV combines cryptographic and content-based authentication techniques to authenticate the base layer and enhancement layers, respectively. Based on temporal scalable characteristics of surveillance codestreams, TrustSSV extracts, updates, and authenticates foreground features for each access unit dynamically with background model support. Using SVC test sequences, our experimental results indicate that the scheme is able to distinguish between content-preserving and content-changing manipulations and to pinpoint tampered locations. Compared with existing schemes, the proposed scheme incurs very small computation and communication costs.
The storage and management of secrets (encryption keys, passwords, etc) are significant open problems in the age of ephemeral, cloud-based computing infrastructure. How do we store and control access to the secrets necessary to configure and operate a range of modern technologies without sacrificing security and privacy requirements or significantly curtailing the desirable capabilities of our systems? To answer this question, we propose Tutamen: a next-generation secret-storage service. Tutamen offers a number of desirable properties not present in existing secret-storage solutions. These include the ability to operate across administrative domain boundaries and atop minimally trusted infrastructure. Tutamen also supports access control based on contextual, multi-factor, and alternate-band authentication parameters. These properties have allowed us to leverage Tutamen to support a variety of use cases not easily realizable using existing systems, including supporting full-disk encryption on headless servers and providing fully-featured client-side encryption for cloud-based file-storage services. In this paper, we present an overview of the secret-storage challenge, Tutamen's design and architecture, the implementation of our Tutamen prototype, and several of the applications we have built atop Tutamen. We conclude that Tutamen effectively eases the secret-storage burden and allows developers and systems administrators to achieve previously unattainable security-oriented goals while still supporting a wide range of feature-oriented requirements.
The low-level C++ programming language is ubiquitously used for its modularity and performance. Typecasting is a fundamental concept in C++ (and object-oriented programming in general) to convert a pointer from one object type into another. However, downcasting (converting a base class pointer to a derived class pointer) has critical security implications due to potentially different object memory layouts. Due to missing type safety in C++, a downcasted pointer can violate a programmer's intended pointer semantics, allowing an attacker to corrupt the underlying memory in a type-unsafe fashion. This vulnerability class is receiving increasing attention and is known as type confusion (or bad-casting). Several existing approaches detect different forms of type confusion, but these solutions are severely limited due to both high run-time performance overhead and low detection coverage. This paper presents TypeSan, a practical type-confusion detector which provides both low run-time overhead and high detection coverage. Despite improving the coverage of state-of-the-art techniques, TypeSan significantly reduces the type-confusion detection overhead compared to other solutions. TypeSan relies on an efficient per-object metadata storage service based on a compact memory shadowing scheme. Our scheme treats all the memory objects (i.e., globals, stack, heap) uniformly to eliminate extra checks on the fast path and relies on a variable compression ratio to minimize run-time performance and memory overhead. Our experimental results confirm that TypeSan is practical, even when explicitly checking almost all the relevant typecasts in a given C++ program. Compared to the state of the art, TypeSan yields orders of magnitude higher coverage at 4–10 times lower performance overhead on SPEC and 2 times on Firefox. As a result, our solution offers superior protection and is suitable for deployment in production software. Moreover, our highly efficient metadata storage back-end is potentially useful for other defenses that require memory object tracking.
In traditional programming courses, students have usually been at least partly graded using pen and paper exams. One of the problems related to such exams is that they only partially connect to the practice conducted within such courses. Testing students in a more practical environment has been constrained due to the limited resources that are needed, for example, for authentication. In this work, we study whether students in a programming course can be identified in an exam setting based solely on their typing patterns. We replicate an earlier study that indicated that keystroke analysis can be used for identifying programmers. Then, we examine how a controlled machine examination setting affects the identification accuracy, i.e. if students can be identified reliably in a machine exam based on typing profiles built with data from students' programming assignments from a course. Finally, we investigate the identification accuracy in an uncontrolled machine exam, where students can complete the exam at any time using any computer they want. Our results indicate that even though the identification accuracy deteriorates when identifying students in an exam, the accuracy is high enough to reliably identify students if the identification is not required to be exact, but top k closest matches are regarded as correct.
Personalized medicine brings the promise of better diagnoses, better treatments, a higher quality of life and increased longevity. To achieve these noble goals, it exploits a number of revolutionary technologies, including genome sequencing and DNA editing, as well as wearable devices and implantable or even edible biosensors. In parallel, the popularity of "quantified self" gadgets shows the willingness of citizens to be more proactive with respect to their own health. Yet, this evolution opens the door to all kinds of abuses, notably in terms of discrimination, blackmailing, stalking, and subversion of devices. After giving a general description of this situation, in this talk we will expound on some of the main concerns, including the temptation to permanently and remotely monitor the physical (and metabolic) activity of individuals. We will describe the potential and the limitations of techniques such as cryptography (including secure multi-party computation), trusted hardware and differential privacy. We will also discuss the notion of consent in the face of the intrinsic correlations of human data. We will argue in favor of a more systematic, principled and cross-disciplinary research effort in this field and will discuss the motives of the various stakeholders.
Wake locks are widely used in Android apps to protect critical computations from being disrupted by device sleeping. Inappropriate use of wake locks often seriously impacts user experience. However, little is known on how wake locks are used in real-world Android apps and the impact of their misuses. To bridge the gap, we conducted a large-scale empirical study on 44,736 commercial and 31 open-source Android apps. By automated program analysis and manual investigation, we observed (1) common program points where wake locks are acquired and released, (2) 13 types of critical computational tasks that are often protected by wake locks, and (3) eight patterns of wake lock misuses that commonly cause functional and non-functional issues, only three of which had been studied by existing work. Based on our findings, we designed a static analysis technique, Elite, to detect two most common patterns of wake lock misuses. Our experiments on real-world subjects showed that Elite is effective and can outperform two state-of-the-art techniques.
Cyber-attacks are cheap, easy to conduct and often pose little risk in terms of attribution, but their impact could be lasting. The low attribution is because tracing cyber-attacks is primitive in the current network architecture. Moreover, even when attribution is known, the absence of enforcement provisions in international law makes cyber attacks tough to litigate, and hence attribution is hardly a deterrent. Rather than attributing attacks, we can re-look at cyber-attacks as societal events associated with social, political, economic and cultural (SPEC) motivations. Because it is possible to observe SPEC motives on the internet, social media data could be valuable in understanding cyber attacks. In this research, we use sentiment in Twitter posts to observe country-to-country perceptions, and Arbor Networks data to build ground truth of country-to-country DDoS cyber-attacks. Using this dataset, this research makes three important contributions: a) We evaluate the impact of heightened sentiments towards a country on the trend of cyber-attacks received by the country. We find that, for some countries, the probability of attacks increases by up to 27% while experiencing negative sentiments from other nations. b) Using cyber-attacks trend and sentiments trend, we build a decision tree model to find attacks that could be related to extreme sentiments. c) To verify our model, we describe three examples in which cyber-attacks follow increased tension between nations, as perceived in social media.
Voice-controlled intelligent personal assistants, such as Cortana, Google Now, Siri and Alexa, are increasingly becoming a part of users' daily lives, especially on mobile devices. They introduce a significant change in information access, not only by introducing voice control and touch gestures but also by enabling dialogues where the context is preserved. This raises the need for evaluation of their effectiveness in assisting users with their tasks. However, in order to understand which type of user interactions reflect different degrees of user satisfaction we need explicit judgements. In this paper, we describe a user study that was designed to measure user satisfaction over a range of typical scenarios of use: controlling a device, web search, and structured search dialogue. Using this data, we study how user satisfaction varied with different usage scenarios and what signals can be used for modeling satisfaction in the different scenarios. We find that the notion of satisfaction varies across different scenarios, and show that, in some scenarios (e.g. making a phone call), task completion is very important while for others (e.g. planning a night out), the amount of effort spent is key. We also study how the nature and complexity of the task at hand affects user satisfaction, and find that preserving the conversation context is essential and that overall task-level satisfaction cannot be reduced to query-level satisfaction alone. Finally, we shed light on the relative effectiveness and usefulness of voice-controlled intelligent agents, explaining their increasing popularity and uptake relative to the traditional query-response interaction.
Operating system kernel is the de facto trusted computing base for most computer systems. To secure the OS kernel, many security mechanisms, e.g., kASLR and StackGuard, have been increasingly deployed to defend against attacks (e.g., code reuse attack). However, the effectiveness of these protections has been proven to be inadequate-there are many information leak vulnerabilities in the kernel to leak the randomized pointer or canary, thus bypassing kASLR and StackGuard. Other sensitive data in the kernel, such as cryptographic keys and file caches, can also be leaked. According to our study, most kernel information leaks are caused by uninitialized data reads. Unfortunately, existing techniques like memory safety enforcements and dynamic access tracking tools are not adequate or efficient enough to mitigate this threat. In this paper, we propose UniSan, a novel, compiler-based approach to eliminate all information leaks caused by uninitialized read in the OS kernel. UniSan achieves this goal using byte-level, flow-sensitive, context-sensitive, and field-sensitive initialization analysis and reachability analysis to check whether an allocation has been fully initialized when it leaves kernel space; if not, it automatically instruments the kernel to initialize this allocation. UniSan's analyses are conservative to avoid false negatives and are robust by preserving the semantics of the OS kernel. We have implemented UniSan as passes in LLVM and applied it to the latest Linux kernel (x86\_64) and Android kernel (AArch64). Our evaluation showed that UniSan can successfully prevent 43 known and many new uninitialized data leak vulnerabilities. Further, 19 new vulnerabilities in the latest kernels have been confirmed by Linux and Google. Our extensive performance evaluation with LMBench, ApacheBench, Android benchmarks, and the SPEC benchmarks also showed that UniSan imposes a negligible performance overhead.
UnlimitID is a method for enhancing the privacy of commodity OAuth and applications such as OpenID Connect, using anonymous attribute-based credentials based on algebraic Message Authentication Codes (aMACs). OAuth is one of the most widely used protocols on the Web, but it exposes each of the requests of a user for data by each relying party (RP) to the identity provider (IdP). Our approach allows for the creation of multiple persistent and unlinkable pseudo-identities and requires no change in the deployed code of relying parties, only in identity providers and the client.
Maintaining and updating signature databases is a tedious task that normally requires a large amount of user effort. The problem becomes harder when features can be distorted by observation noise, which we call volatility. To address this issue, we propose algorithms and models to automatically generate signatures in the presence of noise, with a focus on stack fingerprinting, which is a research area that aims to discover the operating system (OS) of remote hosts using TCP/IP packets. Armed with this framework, we construct a database with 420 network stacks, label the signatures, develop a robust classifier for this database, and fingerprint 66M visible webservers on the Internet.
Event discovery from single pictures is a challenging problem that has raised significant interest in the last decade. During this time, a number of interesting solutions have been proposed to tackle event discovery in still images. However, a large scale benchmarking image dataset for the evaluation and comparison of event discovery algorithms from single images is still lagging behind. To this aim, in this paper we provide a large-scale properly annotated and balanced dataset of 490,000 images, covering every aspect of 14 different types of social events, selected among the most shared ones in the social network. Such a large scale collection of event-related images is intended to become a powerful support tool for the research community in multimedia analysis by providing a common benchmark for training, testing, validation and comparison of existing and novel algorithms. In this paper, we provide a detailed description of how the dataset is collected, organized and how it can be beneficial for the researchers in the multimedia analysis domain. Moreover, a deep learning based approach is introduced into event discovery from single images as one of the possible applications of this dataset with a belief that deep learning can prove to be a breakthrough also in this research area. By providing this dataset, we hope to gather research community in the multimedia and signal processing domains to advance this application.