Biblio
Mobile Devices are part of our lives and we store a lot of private information on it as well as use services that handle sensitive information (e.g. mobile health apps). Whenever users install an application on their smartphones they have to decide whether to trust the applications and share private and sensitive data with at least the developer-owned services. But almost all modern apps not only transmit data to the developer owned servers but also send information to advertising-, analyzing and tracking partners. This paper presents an approach for a "privacy- proxy" which enables to filter unwanted data traffic to third party services without installing additional applications on the smartphone. It is based on a firewall using a black list of tracking- and analyzing networks which is automatically updated on a daily basis. The proof of concept has been implemented with open source components on a Raspberry Pi.
A waste heat recovery system (WHRS) on a process with variable output, is an example of an intermittent renewable process. WHRS recycles waste heat into usable energy. As an example, waste heat produced from refrigeration can be used to provide hot water. However, consistent with most intermittent renewable energy systems, the likelihood of waste heat availability at times of demand is low. For this reason, the WHRS may be coupled with a hot water reservoir (HWR) acting as the energy storage system that aims to maintain desired hot water temperature Td (and therefore energy) at time of demand. The coupling of the WHRS and the HWR must be optimised to ensure higher efficiency given the intermittent mismatch of demand and heat availability. Efficiency of an WHRS can be defined as achieving multiple objectives, including to minimise the need for back-up energy to achieve Td, and to minimise waste heat not captured (when the reservoir volume Vres is too small). This paper investigates the application of a Multi Objective Evolutionary Algorithm (MOEA) to optimise the parameters of the WHRS, including the Vres and depth of discharge (DoD), that affect the WHRS efficiency. Results show that one of the optimum solutions obtained requires the combination of high Vres, high DoD, low water feed in rate, low power external back-up heater and high excess temperature for the HWR to ensure efficiency of the WHRS.
One essential functionality of a modern operating system is to accurately account for the resource usage of the underlying hardware. This is especially important for computing systems that operate on battery power, since energy management requires accurately attributing resource uses to processes. However, components such as sensors, actuators and specialized network interfaces are often used in an asynchronous fashion, and makes it difficult to conduct accurate resource accounting. For example, a process that makes a request to a sensor may not be running on the processor for the full duration of the resource usage; and current mechanisms of resource accounting fail to provide accurate accounting for such asynchronous uses. This paper proposes a new mechanism to accurately account for the asynchronous usage of resources in mobile systems. Our insight is that by accurately relating the user requests with kernel requests to device and corresponding device responses, we can accurately attribute resource use to the requesting process. Our prototype implemented in Linux demonstrates that we can account for the usage of asynchronous resources such as GPS and WiFi accurately.
Cellular towers capture logs of mobile subscribers whenever their devices connect to the network. When the logs show data traffic at a cell tower generated by a device, it reveals that this device is close to the tower. The logs can then be used to trace the locations of mobile subscribers for different applications, such as studying customer behaviour, improving location-based services, or helping urban planning. However, the logs often suffer from an oscillation phenomenon. Oscillations may happen when a device, even when not moving, does not only connect to the nearest cell tower, but is instead unpredictably switching between multiple cell towers because of random noise, load balancing, or simply dynamic changes in signal strength. Detecting and removing oscillations are a challenge when analyzing location data collected from the cellular network. In this paper, we propose an algorithm called SOL (Stable, Oscillation, Leap periods) aimed at discovering and reducing oscillations in the collected logs. We apply our algorithm on real datasets which contain about 18.9\textasciitildeTB of traffic logs generated by more than 3\textasciitildemillion mobile subscribers covering about 21000 cell towers and collected during 27\textasciitildedays from both GSM and UMTS networks in northern China. Experimental results demonstrate the ability and effectiveness of SOL to reduce oscillations in cellular network logs.
We propose PADA, a new power evaluation tool to measure and optimize power use of mobile sensing applications. Our motivational study with 53 professional developers shows they face huge challenges in meeting power requirements. The key challenges are from the significant time and effort for repetitive power measurements since the power use of sensing applications needs to be evaluated under various real-world usage scenarios and sensing parameters. PADA enables developers to obtain enriched power information under diverse usage scenarios in development environments without deploying and testing applications on real phones in real-life situations. We conducted two user studies with 19 developers to evaluate the usability of PADA. We show that developers benefit from using PADA in the implementation and power tuning of mobile sensing applications.
With the massive amounts of data available today, it is common to store and process data using multiple machines. Parallel programming platforms such as MapReduce and its variants are popular frameworks for handling such large data. We present the first provably efficient algorithms to compute, store, and query data structures for range queries and approximate nearest neighbor queries in a popular parallel computing abstraction that captures the salient features of MapReduce and other massively parallel communication (MPC) models. In particular, we describe algorithms for \$kd\$-trees, range trees, and BBD-trees that only require O(1) rounds of communication for both preprocessing and querying while staying competitive in terms of running time and workload to their classical counterparts. Our algorithms are randomized, but they can be made deterministic at some increase in their running time and workload while keeping the number of rounds of communication to be constant.
Learning classifier systems (LCSs) are rule-based evolutionary algorithms uniquely suited to classification and data mining in complex, multi-factorial, and heterogeneous problems. LCS rule fitness is commonly based on accuracy, but this metric alone is not ideal for assessing global rule `value' in noisy problem domains, and thus impedes effective knowledge extraction. Multi-objective fitness functions are promising but rely on knowledge of how to weigh objective importance. Prior knowledge would be unavailable in most real-world problems. The Pareto-front concept offers a strategy for multi-objective machine learning that is agnostic to objective importance. We propose a Pareto-inspired multi-objective rule fitness (PIMORF) for LCS, and combine it with a complimentary rule-compaction approach (SRC). We implemented these strategies in ExSTraCS, a successful supervised LCS and evaluated performance over an array of complex simulated noisy and clean problems (i.e. genetic and multiplexer) that each concurrently model pure interaction effects and heterogeneity. While evaluation over multiple performance metrics yielded mixed results, this work represents an important first step towards efficiently learning complex problem spaces without the advantage of prior problem knowledge. Overall the results suggest that PIMORF paired with SRC improved rule set interpretability, particularly with regard to heterogeneous patterns.
Side-channel analysis and fault-injection attacks are known as major threats to any cryptographic implementation. Protecting cryptographic implementations with suitable countermeasures is thus essential before they are deployed in the wild. However, countermeasures for both threats are of completely different nature: Side-channel analysis is mitigated by techniques that hide or mask key-dependent information while resistance against fault-injection attacks can be achieved by redundancy in the computation for immediate error detection. Since already the integration of any single countermeasure in cryptographic hardware comes with significant costs in terms of performance and area, a combination of multiple countermeasures is expensive and often associated with undesired side effects. In this work, we introduce a countermeasure for cryptographic hardware implementations that combines the concept of a provably-secure masking scheme (i.e., threshold implementation) with an error detecting approach against fault injection. As a case study, we apply our generic construction to the lightweight LED cipher. Our LED instance achieves first-order resistance against side-channel attacks combined with a fault detection capability that is superior to that of simple duplication for most error distributions at an increased area demand of 4.3%.
{Phishing is a social engineering tactic used to trick people into revealing personal information [Zielinska, Tembe, Hong, Ge, Murphy-Hill, & Mayhorn 2014]. As phishing emails continue to infiltrate users' mailboxes, what social engineering techniques are the phishers using to successfully persuade victims into releasing sensitive information? Cialdini's [2007] six principles of persuasion (authority, social proof, liking/similarity, commitment/consistency, scarcity, and reciprocation) have been linked to elements of phishing emails [Akbar 2014; Ferreira, & Lenzini 2015]; however, the findings have been conflicting. Authority and scarcity were found as the most common persuasion principles in 207 emails obtained from a Netherlands database [Akbar 2014], while liking/similarity was the most common principle in 52 personal emails available in Luxemborg and England [Ferreira et al. 2015]. The purpose of this study was to examine the persuasion principles present in emails available in the United States over a period of five years. Two reviewers assessed eight hundred eighty-seven phishing emails from Arizona State University, Brown University, and Cornell University for Cialdini's six principles of persuasion. Each email was evaluated using a questionnaire adapted from the Ferreira et al. [2015] study. There was an average agreement of 87% per item between the two raters. Spearman's Rho correlations were used to compare email characteristics over time. During the five year period under consideration (2010–2015), the persuasion principles of commitment/consistency and scarcity have increased over time, while the principles of reciprocation and social proof have decreased over time. Authority and liking/similarity revealed mixed results with certain characteristics increasing and others decreasing. The commitment/consistency principle could be seen in the increase of emails referring to elements outside the email to look more reliable, such as Google Docs or Adobe Reader (rs(850) = .12
The collaborative nature of content development has given rise to the novel problem of multiple ownership in access control, such that a shared resource is administrated simultaneously by co-owners who may have conflicting privacy preferences and/or sharing needs. Prior work has focused on the design of unsupervised conflict resolution mechanisms. Driven by the need for human consent in organizational settings, this paper explores interactive policy negotiation, an approach complementary to that of prior work. Specifically, we propose an extension of Relationship-Based Access Control (ReBAC) to support multiple ownership, in which a policy negotiation protocol is in place for co-owners to come up with and give consent to an access control policy in a structured manner. During negotiation, the draft policy is assessed by formally defined availability criteria: to the second level of the polynomial hierarchy. We devised two algorithms for verifying policy satisfiability, both employing a modern SAT solver for solving subproblems. The performance is found to be adequate for mid-sized organizations.
In recent years, the number of new examples of malware has continued to increase. To create effective countermeasures, security specialists often must manually inspect vast sandbox logs produced by the dynamic analysis method. Conversely, antivirus vendors usually publish malware analysis reports on their website. Because malware analysis reports and sandbox logs do not have direct connections, when analyzing sandbox logs, security specialists can not benefit from the information described in such expert reports. To address this issue, we developed a system called ReGenerator that automates the generation of reports related to sandbox logs by making use of existing reports published by antivirus vendors. Our system combines several techniques, including the Jaccard similarity, Natural Language Processing (NLP), and Generation (NLG), to produce concise human-readable reports describing malicious behavior for security specialists.
In many domains, a plethora of textual information is available on the web as news reports, blog posts, community portals, etc. Information extraction (IE) is the default technique to turn unstructured text into structured fact databases, but systematically applying IE techniques to web input requires highly complex systems, starting from focused crawlers over quality assurance methods to cope with the HTML input to long pipelines of natural language processing and IE algorithms. Although a number of tools for each of these steps exists, their seamless, flexible, and scalable combination into a web scale end-to-end text analytics system still is a true challenge. In this paper, we report our experiences from building such a system for comparing the "web view" on health related topics with that derived from a controlled scientific corpus, i.e., Medline. The system combines a focused crawler, applying shallow text analysis and classification to maintain focus, with a sophisticated text analytic engine inside the Big Data processing system Stratosphere. We describe a practical approach to seed generation which led us crawl a corpus of \textasciitilde1 TB web pages highly enriched for the biomedical domain. Pages were run through a complex pipeline of best-of-breed tools for a multitude of necessary tasks, such as HTML repair, boilerplate detection, sentence detection, linguistic annotation, parsing, and eventually named entity recognition for several types of entities. Results are compared with those from running the same pipeline (without the web-related tasks) on a corpus of 24 million scientific abstracts and a third corpus made of \textasciitilde250K scientific full texts. We evaluate scalability, quality, and robustness of the employed methods and tools. The focus of this paper is to provide a large, real-life use case to inspire future research into robust, easy-to-use, and scalable methods for domain-specific IE at web scale.
The Controller Area Network (CAN) protocol has become the primary choice for in-vehicle communications for passenger cars and commercial vehicles. However, it is possible for malicious adversaries to cause major damage by exploiting flaws in the CAN protocol design or implementation. Researchers have shown that an attacker can remotely inject malicious messages into the CAN network in order to disrupt or alter normal vehicle behavior. Some of these attacks can lead to catastrophic consequences for both the vehicle and the driver. Although there are several defense techniques against CAN based attacks, attack surfaces like physically and remotely controllable Electronic Control Units (ECUs) can be used to launch attacks on protocols running on top of the CAN network, such as the SAE J1939 protocol. Commercial vehicles adhere to the SAE J1939 standards that make use of the CAN protocol for physical communication and that are modeled in a manner similar to that of the ISO/OSI 7 layer protocol stack. We posit that the J1939 standards can be subjected to attacks similar to those that have been launched successfully on the OSI layer protocols. Towards this end, we demonstrate how such attacks can be performed on a test-bed having 3 J1939 speaking ECUs connected via a single high-speed CAN bus. Our main goal is to show that the regular operations performed by the J1939 speaking ECUs can be disrupted by manipulating the packet exchange protocols and specifications made by J1939 data-link layer standards. The list of attacks documented in this paper is not comprehensive but given the homogeneous and ubiquitous usage of J1939 standards in commercial vehicles we believe these attacks, along with newer attacks introduced in the future, can cause widespread damage in the heavy vehicle industry, if not mitigated pro-actively.
Digital signatures are perhaps the most important base for authentication and trust relationships in large scale systems. More specifically, various applications of signatures provide privacy and anonymity preserving mechanisms and protocols, and these, in turn, are becoming critical (due to the recently recognized need to protect individuals according to national rules and regulations). A specific type of signatures called "signatures with efficient protocols", as introduced by Camenisch and Lysyanskaya (CL), efficiently accommodates various basic protocols and extensions like zero-knowledge proofs, signing committed messages, or re-randomizability. These are, in fact, typical operations associated with signatures used in typical anonymity and privacy-preserving scenarios. To date there are no "signatures with efficient protocols" which are based on simple assumptions and truly practical. These two properties assure us a robust primitive: First, simple assumptions are needed for ensuring that this basic primitive is mathematically robust and does not require special ad hoc assumptions that are more risky, imply less efficiency, are more tuned to the protocol itself, and are perhaps less trusted. In the other dimension, efficiency is a must given the anonymity applications of the protocol, since without proper level of efficiency the future adoption of the primitives is always questionable (in spite of their need). In this work, we present a new CL-type signature scheme that is re-randomizable under a simple, well-studied, and by now standard, assumption (SXDH). The signature is efficient (built on the recent QA-NIZK constructions), and is, by design, suitable to work in extended contexts that typify privacy settings (like anonymous credentials, group signature, and offline e-cash). We demonstrate its power by presenting practical protocols based on it.
As everyone knows vulnerability detection is a very difficult and time consuming work, so taking advantage of the unlabeled data sufficiently is needed and helpful. According the above reality, in this paper a method is proposed to predict buffer overflow based on semi-supervised learning. We first employ Antlr to extract AST from C/C++ source files, then according to the 22 buffer overflow attributes taxonomies, a 22-dimension vector is extracted from every function in AST, at last, the vector is leveraged to train a classifier to predict buffer overflow vulnerabilities. The experiment and evaluation indicate our method is correct and efficient.
Modern operating systems use hardware support to protect against control-flow hijacking attacks such as code-injection attacks. Typically, write access to executable pages is prevented and kernel mode execution is restricted to kernel code pages only. However, current CPUs provide no protection against code-reuse attacks like ROP. ASLR is used to prevent these attacks by making all addresses unpredictable for an attacker. Hence, the kernel security relies fundamentally on preventing access to address information. We introduce Prefetch Side-Channel Attacks, a new class of generic attacks exploiting major weaknesses in prefetch instructions. This allows unprivileged attackers to obtain address information and thus compromise the entire system by defeating SMAP, SMEP, and kernel ASLR. Prefetch can fetch inaccessible privileged memory into various caches on Intel x86. It also leaks the translation-level for virtual addresses on both Intel x86 and ARMv8-A. We build three attacks exploiting these properties. Our first attack retrieves an exact image of the full paging hierarchy of a process, defeating both user space and kernel space ASLR. Our second attack resolves virtual to physical addresses to bypass SMAP on 64-bit Linux systems, enabling ret2dir attacks. We demonstrate this from unprivileged user programs on Linux and inside Amazon EC2 virtual machines. Finally, we demonstrate how to defeat kernel ASLR on Windows 10, enabling ROP attacks on kernel and driver binary code. We propose a new form of strong kernel isolation to protect commodity systems incuring an overhead of only 0.06-5.09%.
Vehicular users are expected to consume large amounts of data, for both entertainment and navigation purposes. This will put a strain on cellular networks, which will be able to cope with such a load only if proper caching is in place; this in turn begs the question of which caching architecture is the best-suited to deal with vehicular content consumption. In this paper, we leverage a large-scale, crowd-sourced trace to (i) characterize the vehicular traffic demand, in terms of overall magnitude and content breakup; (ii) assess how different caching approaches perform against such a real-world load; (iii) study the effect of recommendation systems and local content items. We define a price-of-fog metric, expressing the additional caching capacity to deploy when moving from traditional, centralized caching architectures to a "fog computing" approach, where caches are closer to the network edge. We find that for location-specific items, such as the ones that vehicular users are most likely to request, such a price almost disappears. Vehicular networks thus make a strong case for the adoption of mobile-edge caching, as we are able to reap the benefit thereof – including a reduction in the distance travelled by data, within the core network – with little or none of the associated disadvantages.
Workflow-centric tracing captures the workflow of causally-related events (e.g., work done to process a request) within and among the components of a distributed system. As distributed systems grow in scale and complexity, such tracing is becoming a critical tool for understanding distributed system behavior. Yet, there is a fundamental lack of clarity about how such infrastructures should be designed to provide maximum benefit for important management tasks, such as resource accounting and diagnosis. Without research into this important issue, there is a danger that workflow-centric tracing will not reach its full potential. To help, this paper distills the design space of workflow-centric tracing and describes key design choices that can help or hinder a tracing infrastructures utility for important tasks. Our design space and the design choices we suggest are based on our experiences developing several previous workflow-centric tracing infrastructures.
In recent times, we have seen a proliferation of personal data. This can be attributed not just to a larger proportion of our lives moving online, but also through the rise of ubiquitous sensing through mobile and IoT devices. Alongside this surge, concerns over privacy, trust, and security are expressed more and more as different parties attempt to take advantage of this rich assortment of data. The Databox seeks to enable all the advantages of personal data analytics while at the same time enforcing **accountability** and **control** in order to protect a user's privacy. In this work, we propose and delineate a personal networked device that allows users to **collate**, **curate**, and **mediate** their personal data.
The Internet of Things (IoT) systems are designed and developed either as standalone applications from the ground-up or with the help of IoT middleware platforms. They are designed to support different kinds of scenarios, such as smart homes and smart cities. Thus far, privacy concerns have not been explicitly considered by IoT applications and middleware platforms. This is partly due to the lack of systematic methods for designing privacy that can guide the software development process in IoT. In this paper, we propose a set of guidelines, a privacy by-design framework, that can be used to assess privacy capabilities and gaps of existing IoT applications as well as middleware platforms. We have evaluated two open source IoT middleware platforms, namely OpenIoT and Eclipse SmartHome, to demonstrate how our framework can be used in this way.