Biblio
Geo-distributed Situation Awareness applications are large in scale and are characterized by 24/7 data generation from mobile and stationary sensors (such as cameras and GPS devices); latency-sensitivity for converting sensed data to actionable knowledge; and elastic and bursty needs for computational resources. Fog computing [7] envisions providing computational resources close to the edge of the network, consequently reducing the latency for the sense-process-actuate cycle that exists in these applications. We propose Foglets, a programming infrastructure for the geo-distributed computational continuum represented by fog nodes and the cloud. Foglets provides APIs for a spatio-temporal data abstraction for storing and retrieving application generated data on the local nodes, and primitives for communication among the resources in the computational continuum. Foglets manages the application components on the Fog nodes. Algorithms are presented for launching application components and handling the migration of these components between Fog nodes, based on the mobility pattern of the sensors and the dynamic computational needs of the application. Evaluation results are presented for a Fog network consisting of 16 nodes using a simulated vehicular network as the workload. We show that the discovery and deployment protocol can be executed in 0.93 secs, and joining an already deployed application can be as quick as 65 ms. Also, QoS-sensitive proactive migration can be accomplished in 6 ms.
The Internet of Things (IoT) is slowly, but steadily, changing the way we interact with our surrounding. Smart cities, smart environments, smart buildings are just a few macroscopic examples of how smart ecosystems are increasingly involved in our daily life, each one offering a different set of information. This information's decentralization and scattering can be exploited, optimizing mobile nodes on-demand information retrieval process. We propose an approach focused on defining competence domains in smart systems where the responsibility of providing a specific information to a mobile node is defined by spatial constraints. By exploiting the interplay and duality of Cloud Computing and Fog Computing we introduce an approach to exploit data spatial allocation in smart systems to optimize mobile nodes information retrieval.
Smart Transportation applications by nature are examples of Vehicular Ad-hoc Network (VANETs) applications where mobile vehicles, roadside units and transportation infrastructure interplay with one another to provide value added services. While there are abundant researches that focused on the communication aspect of such Mobile Ad-hoc Networks, there are few research bodies that target the development of VANET applications. Among the popular VANET applications, a dominant direction is to leverage Cloud infrastructure to execute and deliver applications and services. Recent studies showed that Cloud Computing is not sufficient for many VANET applications due to the mobility of vehicles and the latency sensitive requirements they impose. To this end, Fog Computing has been proposed to leverage computation infrastructure that is closer to the network edge to compliment Cloud Computing in providing latency-sensitive applications and services. However, applications development in Fog environment is much more challenging than in the Cloud due to the distributed nature of Fog systems. In this paper, we investigate how Smart Transportation applications are developed following Fog Computing approach, their challenges and possible mitigation from the state of the arts.
The notion of edge computing introduces new computing functions away from centralized locations and closer to the network edge and thus facilitating new applications and services. This enhanced computing paradigm is provides new opportunities to applications developers, not available otherwise. In this talk, I will discuss why placing computation functions at the extreme edge of our network infrastructure, i.e., in wireless Access Points and home set-top boxes, is particularly beneficial for a large class of emerging applications. I will discuss a specific approach, called ParaDrop, to implement such edge computing functionalities, and use examples from different domains – smarter homes, sustainability, and intelligent transportation – to illustrate the new opportunities around this concept.
Wearable devices, which are small electronic devices worn on a human body, are equipped with low level of processing and storage capacities and offer some types of integrated functionalities. Recently, wearable device is becoming increasingly popular, various kinds of wearable device are launched in the market; however, wearable devices require a powerful local-hub, most are smartphone, to replenish processing and storage capacities for advanced functionalities. Sometime it may be inconvenient to carry the local-hub (smartphone); thus, many wearable devices are equipped with Wi-Fi interface, enabling them to exchange data with local-hub though the Internet when the local-hub is not nearby. However, this results in long response time and restricted functionalities. In this paper, we present a virtual local-hub solution, which utilizes network equipment nearby (e.g., Wi-Fi APs) as the local-hub. Since migrating all applications serving the wearable devices respectively takes too much networking and storage resources, the proposed solution deploys function modules to multiple network nodes and enables remote function module sharing among different users and applications. To reduce the impact of the solution on the network bandwidth, we propose a heuristic algorithm for function module allocation with the objective of minimizing total bandwidth consumption. We conduct series of experiments, and the results show that the proposed solution can reduce the bandwidth consumption by up to half and still serve all requests given a large number of service requests.
When focusing on the Internet of Things (IoT), communicating and coordinating sensor–actuator data via the cloud involves inefficient overheads and reduces autonomous behavior. The Fog Computing paradigm essentially moves the compute nodes closer to sensing entities by exploiting peers and intermediary network devices. This reduces centralized communication with the cloud and entails increased coordination between sensing entities and (possibly available) smart network gateway devices. In this paper, we analyze the utility of offloading computation among peers when working in fog based deployments. It is important to study the trade-offs involved with such computation offloading, as we deal with resource (energy, computation capacity) limited devices. Devices computing in a distributed environment may choose to locally compute part of their data and communicate the remainder to their peers. An optimization formulation is presented that is applied to various deployment scenarios, taking the computation and communication overheads into account. Our technique is demonstrated on a network of robotic sensor–actuators developed on the ROS (Robot Operating System) platform, that coordinate over the fog to complete a task. We demonstrate 77.8% latency and 54% battery usage improvements over large computation tasks, by applying this optimal offloading.
Early Internet of Things(IoT) applications have been build around cloud-centric architectures where information generated at the edge by the "things" in conveyed and processed in a cloud infrastructure. These architectures centralise processing and decision on the data-centre assuming sufficient connectivity, bandwidth and latency. As applications of the Internet of Things extend to industrial and more demanding consumer applications, the assumptions underlying cloud-centric architectures start to be violated as for several of these applications connectivity, bandwidth and latency to the data-centre are a challenge. Fog and Mist computing have emerged as forms of "Cloud Computing" closer to the "Edge" and to the "Things" that should alleviate the connectivity, bandwidth and latency challenges faced by Industrial and extremely demanding Consumer Internet of Things Applications. This keynote, will (1) introduce Cloud, Fog and Mist Computing architectures for the Internet of Things, (2) motivate their need and explain their applicability with real-world use cases, and (3) assess their technological maturity and highlight the areas that require further academic and industrial research.
Offloading computationally expensive Simultaneous Localization and Mapping (SLAM) task for mobile robots have attracted significant attention during the last few years. Lack of powerful on-board compute capability in these energy constrained mobile robots and rapid advancement in compute cloud access technologies laid the foundation for development of several Cloud Robotics platforms that enabled parallel execution of computationally expensive robotic algorithms, especially involving multiple robots. In this work the Cloud Robotics concept is extended to include the current emphasis of computing at the network edge nodes along with the Cloud. The requirements and advantages of using edge nodes for computation offloading over remote cloud or local robot clusters are discussed with reference to the ETSI 'Mobile-Edge Computing' initiative and OpenFog Consortium's 'OpenFog Architecture'. A Particle Filter algorithm for SLAM is modified and implemented for offloading in a multi-tier edge+cloud setup. Additionally a model is proposed for offloading decision in such a setup with experiments and results demonstrating the efficacy of the proposed dynamic offloading scheme over static offloading strategies.
Accelerated Processing Unit (APU) is a heterogeneous multicore processor that contains general-purpose CPU cores and a GPU in a single chip. It also supports Heterogeneous System Architecture (HSA) that provides coherent physically-shared memory between the CPU and the GPU. In this paper, we present the design and implementation of a high-performance IPsec gateway using a low-cost commodity embedded APU. The HSA supported by the APUs eliminates the data copy overhead between the CPU and the GPU, which is unavoidable in the previous discrete GPU approaches. The gateway is implemented in OpenCL to exploit the GPU and uses zero-copy packet I/O APIs in DPDK. The IPsec gateway handles the real-world network traffic where each packet has a different workload. The proposed packet scheduling algorithm significantly improves GPU utilization for such traffic. It works not only for APUs but also for discrete GPUs. With three CPU cores and one GPU in the APU, the IPsec gateway achieves a throughput of 10.36 Gbps with an average latency of 2.79 ms to perform AES-CBC+HMAC-SHA1 for incoming packets of 1024 bytes.
Developers and academics are constantly seeking to increase the speed and security of operating systems. Unfortunately, an increase in either one often comes at the cost of the other. In this paper, we present an operating system design that challenges a long-held tenet of multicore operating systems in order to produce an alternative architecture that has the potential to deliver both increased security and faster performance. In particular, we propose decoupling the operating system kernel from user processes by running each on completely separate processor cores instead of at different privilege levels within shared cores. Without using the hardware's privilege modes, virtualization and virtual memory contexts enforce the security policies necessary to maintain process isolation and protection. Our new kernel design paradigm offers the opportunity to simultaneously increase both performance and security; utilizing the hardware facilities for inter-core communication in place of those for privilege mode switching offers the opportunity for increased system call performance, while the hard separation between user processes and the kernel provides several strong security properties.
We introduce a system-level Simulation and Analysis Engine (SAE) framework based on dynamic binary instrumentation for fine-grained and customizable instruction-level introspection of everything that executes on the processor. SAE can instrument the BIOS, kernel, drivers, and user processes. It can also instrument multiple systems simultaneously using a single instrumentation interface, which is essential for studying scale-out applications. SAE is an x86 instruction set simulator designed specifically to enable rapid prototyping, evaluation, and validation of architectural extensions and program analysis tools using its flexible APIs. It is fast enough to execute full platform workloads–-a modern operating system can boot in a few minutes–-thus enabling research, evaluation, and validation of complex functionalities related to multicore configurations, virtualization, security, and more. To reach high speeds, SAE couples tightly with a virtual platform and employs both a just-in-time (JIT) compiler that helps simulate simple instructions efficiently and a fast interpreter for simulating new or complex instructions. We describe SAE's architecture and instrumentation engine design and show the framework's usefulness for single- and multi-system architectural and program analysis studies.
Modern multicore processors feature easily accessible temperature sensors that provide useful information for dynamic thermal management. These sensors were recently shown to be a potential security threat, since otherwise isolated applications can exploit them to establish a thermal covert channel and leak restricted information. Previous research showed experiments that document the feasibility of (low-rate) communication over this channel, but did not further analyze its fundamental characteristics. For this reason, the important questions of quantifying the channel capacity and achievable rates remain unanswered. To address these questions, we devise and exploit a new methodology that leverages both theoretical results from information theory and experimental data to study these thermal covert channels on modern multicores. We use spectral techniques to analyze data from two representative platforms and estimate the capacity of the channels from a source application to temperature sensors on the same or different cores. We estimate the capacity to be in the order of 300 bits per second (bps) for the same-core channel, i.e., when reading the temperature on the same core where the source application runs, and in the order of 50 bps for the 1-hop channel, i.e., when reading the temperature of the core physically next to the one where the source application runs. Moreover, we show a communication scheme that achieves rates of more than 45 bps on the same-core channel and more than 5 bps on the 1-hop channel, with less than 1% error probability. The highest rate shown in previous work was 1.33 bps on the 1-hop channel with 11% error probability.
Computed Tomography (CT) Image Reconstruction is an important technique used in a wide range of applications, ranging from explosive detection, medical imaging to scientific imaging. Among available reconstruction methods, Model Based Iterative Reconstruction (MBIR) produces higher quality images and allows for the use of more general CT scanner geometries than is possible with more commonly used methods. The high computational cost of MBIR, however, often makes it impractical in applications for which it would otherwise be ideal. This paper describes a new MBIR implementation that significantly reduces the computational cost of MBIR while retaining its benefits. It describes a novel organization of the scanner data into super-voxels (SV) that, combined with a super-voxel buffer (SVB), dramatically increase locality and prefetching, enable parallelism across SVs and lead to an average speedup of 187 on 20 cores.
Defense-in-depth is an important security architecture principle that has significant application to industrial control systems (ICS), cloud services, storehouses of sensitive data, and many other areas. We claim that an ideal defense-in-depth posture is 'deep', containing many layers of security, and 'narrow', the number of node independent attack paths is minimized. Unfortunately, accurately calculating both depth and width is difficult using standard graph algorithms because of a lack of independence between multiple vulnerability instances (i.e., if an attacker can penetrate a particular vulnerability on one host then they can likely penetrate the same vulnerability on another host). To address this, we represent known weaknesses and vulnerabilities as a type of colored attack graph. We measure depth and width through solving the shortest color path and minimum color cut problems. We prove both of these to be NP-Hard and thus for our solution we provide a suite of greedy heuristics. We then empirically apply our approach to large randomly generated networks as well as to ICS networks generated from a published ICS attack template. Lastly, we discuss how to use these results to help guide improvements to defense-in-depth postures.
The Physical Web is a project announced by Google's Chrome team that essentially provides a framework to discover "smart" physical objects (e.g. vending machines, classroom, conference room, cafeteria etc.) and interact with specific, contextual content without having to resort to downloading a specific app. A common app such as the open source and freely available Physical Web app on the Google Play Store or the BKON Browser on the Apple App Store, can access nearby beacons. A current work-in-progress at the University of Maui College is developing a campus-wide prototype of beacon technology using Eddystone-URL and EID protocol from various beacon vendors.
Cloud storage services such as Dropbox [1] and Google Drive [2] are becoming more and more popular. On the one hand, they provide users with mobility, scalability, and convenience. However, privacy issues arise when the storage becomes not fully controlled by users. Although modern encryption schemes are effective at protecting content of data, there are two drawbacks of the encryption-before-outsourcing approach: First, one kind of sensitive information, Access Pattern of the data is left unprotected. Moreover, encryption usually makes the data difficult to use. In this paper, we propose AIS (Access Indistinguishable Storage), the first client-side system that can partially conceal access pattern of the cloud storage in constant time. Besides data content, AIS can conceal information about the number of initial files, and length of each initial file. When it comes to the access phase after initiation, AIS can effectively conceal the behavior (read or write) and target file of the current access. Moreover, the existence and length of each file will remain confidential as long as there is no access after initiation. One application of AIS is SSE (Searchable Symmetric Encryption), which makes the encrypted data searchable. Based on AIS, we propose SBA (SSE Built on AIS). To the best of our knowledge, SBA is safer than any other SSE systems of the same complexity, and SBA is the first to conceal whether current keyword was queried before, the first to conceal whether current operation is an addition or deletion, and the first to support direct modification of files.
In this work, we give a lattice attack on the ECDSA implementation in the latest version of OpenSSL, which implement the scalar multiplication by windowed Non-Adjacent Form method. We propose a totally different but more efficient method of extracting and utilizing information from the side-channel results, remarkably improving the previous attacks. First, we develop a new efficient method, which can extract almost all information from the side-channel results, obtaining 105.8 bits of information per signature on average for 256-bit ECDSA. Then in order to make the utmost of our extracted information, we translate the problem of recovering secret key to the Extended Hidden Number Problem, which can be solved by lattice reduction algorithms. Finally, we introduce the methods of elimination, merging, most significant digit recovering and enumeration to improve the attack. Our attack is mounted to the \series secp256k1\ curve, and the result shows that only 4 signatures would be enough to recover the secret key if the Flush+Reload attack is implemented perfectly without any error,which is much better than the best known result needing at least 13 signatures.
Evolutionary Computation (EC) has been used with great success on various real-world problems. One domain abundant with numerous difficult problems is cryptology. Cryptology can be divided into cryptography, that informally speaking considers methods how to ensure secrecy (but also authenticity, privacy, etc.), and cryptanalysis, that deals with methods how to break cryptographic systems. Although not always in an obvious way, EC can be applied to problems from both domains. This tutorial will first give a brief introduction to cryptology intended for general audience (therefore, omitting proofs and mathematics behind many concepts). Afterwards, we concentrate on several topics from cryptography that are successfully tackled up to now with EC and discuss why those topics are suitable to apply EC. However, care must be taken since there exists a number of problems that seem to be impossible to solve with EC and one needs to realize the limitations of the heuristics. We will discuss the choice of appropriate EC techniques (GA, GP, CGP, ES, multi-objective optimization, etc) for various problems and evaluate on the importance of that choice. Furthermore, we will discuss the gap between the cryptographic community and EC community and what does that mean for the results. By doing that, we will give a special emphasis on the perspective that cryptography presents a source of benchmark problems for the EC community. To conclude, we will present a number of topics we consider to be a strong research choice that can have a real-world impact. In that part, we give a special attention to cryptographic problems where cryptographic community successfully applied EC, but where those problems remained out of the focus of EC community. This tutorial will also present some live demos of EC in action when dealing with cryptographic problems. We will present several problems, ways of encoding solutions, impact of the algorithms choice and finally, we will run some experiments to show the results and discuss how to assess them from cryptographic perspective.
A program subject to a Return-Oriented Programming (ROP) attack usually presents an execution trace with a high frequency of indirect branches. From this observation, several researchers have proposed to monitor the density of these instructions to detect ROP attacks. These techniques use universal thresholds: the density of indirect branches that characterizes an attack is the same for every application. This paper shows that universal thresholds are easy to circumvent. As an alternative, we introduce an inter-procedural semi-context-sensitive static code analysis that estimates the maximum density of indirect branches possible for a program. This analysis determines detection thresholds for each application; thus, making it more difficult for attackers to compromise programs via ROP. We have used an implementation of our technique in LLVM to find specific thresholds for the programs in SPEC CPU2006. By comparing these thresholds against actual execution traces of corresponding programs, we demonstrate the accuracy of our approach. Furthermore, our algorithm is practical: it finds an approximate solution to a theoretically undecidable problem, and handles programs with up to 700 thousand assembly instructions in 25 minutes.
We study the problem of estimating distinct elements in the data stream model, which has a central role in traffic monitoring, query optimization, data mining and data integration. Different from all previous work, we study the problem in the noisy data setting, where two different looking items in the stream may reference the same entity (determined by a distance function and a threshold value), and the goal is to estimate the number of distinct entities in the stream. In this paper, we formalize the problem of robust distinct elements, and develop space and time-efficient streaming algorithms for datasets in the Euclidean space, using a novel technique we call bucket sampling. We also extend our algorithmic framework to other metric spaces by establishing a connection between bucket sampling and the theory of locality sensitive hashing. Moreover, we formally prove that our algorithms are still effective under small distinct elements ambiguity. Our experiments demonstrate the practicality of our algorithms.
The use of flashSSDs has increased rapidly in a wide range of areas due to their superior energy efficiency, shorter access time, and higher bandwidth when compared to HDDs. The internal parallelism created by multiple flash memory packages embedded in a flashSSDs, is one of the unique features of flashSSDs. Many new DBMS technologies have been developed for flashSSDs, but query processing for flashSSDs have drawn less attention than other DBMS technologies. Hash partitioning is popularly used in query processing algorithms to materialize their intermediate results in an efficient manner. In this paper, we propose a novel hash partitioning algorithm that exploits the internal parallelism of flashSSDs. The devised hash partitioning method outperforms the traditional hash partitioning technique regardless of the amount of available main memory independently from the buffer management strategies (blocked I/O vs page sized I/O). We implemented our method based on the source code of the PostgreSQL storage manager. PostgreSQL relation files created by the TPC-H workload were employed in the experiments. Our method was found to be up to 3.55 times faster than the traditional method with blocked I/O, and 2.36 times faster than the traditional method with pagesized I/O.
We propose a novel, scalable, and principled graph sketching technique based on minwise hashing of local neighborhood. For an n-node graph with e-edges (e textgreatertextgreater n), we incrementally maintain in real-time a minwise neighbor sampled subgraph using k hash functions in O(n x k) memory, limit being user-configurable by the parameter k. Symmetrization and similarity based techniques can recover from these data structures a significant portion of the original graph. We present theoretical analysis of the minwise sampling strategy and also derive unbiased estimators for important graph properties such as triangle count and neighborhood overlap. We perform an extensive empirical evaluation of our graph sketch and it's derivatives on a wide variety of real-world graph data sets drawn from different application domains using important large network analysis algorithms: local and global clustering coefficient, PageRank, and local graph sparsification. With bounded memory, the quality of results using the sketch representation is competitive against baselines which use the full graph, and the computational performance is often better. Our framework is flexible and configurable to be leveraged by numerous other graph analytics algorithms, potentially reducing the information mining time on large streamed graphs for a variety of applications.
Recently, multimodal hashing techniques have received considerable attention due to their low storage cost and fast query speed for multimodal data retrieval. Many methods have been proposed; however, there are still some problems that need to be further considered. For example, some of these methods just use a similarity matrix for learning hash functions which will discard some useful information contained in original data; some of them relax binary constraints or separate the process of learning hash functions and binary codes into two independent stages to bypass the obstacle of handling the discrete constraints on binary codes for optimization, which may generate large quantization error; some of them are not robust to noise. All these problems may degrade the performance of a model. To consider these problems, in this paper, we propose a novel supervised hashing framework for cross-modal retrieval, i.e., Supervised Robust Discrete Multimodal Hashing (SRDMH). Specifically, SRDMH tries to make final binary codes preserve label information as same as that in original data so that it can leverage more label information to supervise the binary codes learning. In addition, it learns hashing functions and binary codes directly instead of relaxing the binary constraints so as to avoid large quantization error problem. Moreover, to make it robust and easy to solve, we further integrate a flexible l2,p loss with nonlinear kernel embedding and an intermediate presentation of each instance. Finally, an alternating algorithm is proposed to solve the optimization problem in SRDMH. Extensive experiments are conducted on three benchmark data sets. The results demonstrate that the proposed method (SRDMH) outperforms or is comparable to several state-of-the-art methods for cross-modal retrieval task.
Hashing based methods have attracted considerable attention for efficient cross-modal retrieval on large-scale multimedia data. The core problem of cross-modal hashing is how to effectively integrate heterogeneous features from different modalities to learn hash functions using available supervising information, e.g., class labels. Existing hashing based methods generally project heterogeneous features to a common space for hash codes generation, and the supervising information is incrementally used for improving performance. However, these methods may produce ineffective hash codes, due to the failure to explore the discriminative property of supervising information and to effectively bridge the semantic gap between different modalities. To address these challenges, we propose a novel hashing based method in a linear classification framework, in which the proposed method learns modality-specific hash functions for generating unified binary codes, and these binary codes are viewed as representative features for discriminative classification with class labels. An effective optimization algorithm is developed for the proposed method to jointly learn the modality-specific hash function, the unified binary codes and a linear classifier. Extensive experiments on three benchmark datasets highlight the advantage of the proposed method and show that it achieves the state-of-the-art performance.
General sparse matrix-matrix multiplication (SpGEMM) is a core component of many algorithms. A number of recent works have used high throughput graphics processing units (GPUs) to accelerate SpGEMM. However, exploiting the power of GPUs for SpGEMM requires addressing a number of challenges, including highly imbalanced workloads and large numbers of inefficient random global memory accesses. This paper presents a SpGEMM algorithm which uses several novel techniques to overcome these problems. We first propose two low cost methods to achieve perfect load balancing during the most expensive step in SpGEMM. Next, we show how to eliminate nearly all random global memory accesses using shared memory based hash tables. To optimize the performance of the hash tables, we propose a lightweight method to estimate the number of nonzeros in the output matrix. We compared our algorithm to the CUSP, CUSPARSE and the state-of-the-art BHSPARSE GPU SpGEMM algorithms, and show that it performs 5.6x, 2.4x and 1.5x better on average, and up to 11.8x, 9.5x and 2.5x better in the best case, respectively. Furthermore, we show that our algorithm performs especially well on highly imbalanced and unstructured matrices.

