Biblio
In cloud computing environments with many virtual machines, containers, and other systems, an epidemic of malware can be crippling and highly threatening to business processes. In this vision paper, we introduce a hierarchical approach to performing malware detection and analysis using several recent advances in machine learning on graphs, hypergraphs, and natural language. We analyze individual systems and their logs, inspecting and understanding their behavior with attentional sequence models. Given a feature representation of each system's logs using this procedure, we construct an attributed network of the cloud with systems and other components as vertices and propose an analysis of malware with inductive graph and hypergraph learning models. With this foundation, we consider the multicloud case, in which multiple clouds with differing privacy requirements cooperate against the spread of malware, proposing the use of federated learning to perform inference and training while preserving privacy. Finally, we discuss several open problems that remain in defending cloud computing environments against malware related to designing robust ecosystems, identifying cloud-specific optimization problems for response strategy, action spaces for malware containment and eradication, and developing priors and transfer learning tasks for machine learning models in this area.
Malware is pervasive and poses serious threats to normal operation of business processes in cloud. Cloud computing environments typically have hundreds of hosts that are connected to each other, often with high risk trust assumptions and/or protection mechanisms that are not difficult to break. Malware often exploits such weaknesses, as its immediate goal is often to spread itself to as many hosts as possible. Detecting this propagation is often difficult to address because the malware may reside in multiple components across the software or hardware stack. In this scenario, it is usually best to contain the malware to the smallest possible number of hosts, and it's also critical for system administration to resolve the issue in a timely manner. Furthermore, resolution often requires that several participants across different organizational teams scramble together to address the intrusion. In this vision paper, we define this problem in detail. We then present our vision of decentralized malware containment and the challenges and issues associated with this vision. The approach of containment involves detection and response using graph analytics coupled with a blockchain framework. We propose the use of a dominance frontier for profile nodes which must be involved in the containment process. Smart contracts are used to obtain consensus amongst the involved parties. The paper presents a basic implementation of this proposal. We have further discussed some open problems related to our vision.
With the development of IoT and 5G networks, the demand for the next-generation intelligent transportation system has been growing at a rapid pace. Dynamic mapping has been considered one of the key technologies to reduce traffic accidents and congestion in the intelligent transportation system. However, as the number of vehicles keeps growing, a huge volume of mapping traffic may overload the central cloud, leading to serious performance degradation. In this paper, we propose and prototype a CUPS (control and user plane separation)-based edge computing architecture for the dynamic mapping and quantify its benefits by prototyping. There are a couple of merits of our proposal: (i) we can mitigate the overhead of the networks and central cloud because we only need to abstract and send global dynamic mapping information from the edge servers to the central cloud; (ii) we can reduce the response latency since the dynamic mapping traffic can be isolated from other data traffic by being generated and distributed from a local edge server that is deployed closer to the vehicles than the central server in cloud. The capabilities of our system have been quantified. The experimental results have shown our system achieves throughput improvement by more than four times, and response latency reduction by 67.8% compared to the conventional central cloud-based approach. Although these results are still obtained from the preliminary evaluations using our prototype system, we believe that our proposed architecture gives insight into how we utilize CUPS and edge computing to enable efficient dynamic mapping applications.
With the development of Internet of Things, numerous IoT devices have been brought into our daily lives. Bluetooth Low Energy (BLE), due to the low energy consumption and generic service stack, has become one of the most popular wireless communication technologies for IoT. However, because of the short communication range and exclusive connection pattern, a BLE-equipped device can only be used by a single user near the device. To fully explore the benefits of BLE and make BLE-equipped devices truly accessible over the Internet as IoT devices, in this paper, we propose a cloud-based software framework that can enable multiple users to interact with various BLE IoT devices over the Internet. This framework includes an agent program, a suite of services hosting in cloud, and a set of RESTful APIs exposed to Internet users. Given the availability of this framework, the access to BLE devices can be extended from local to the Internet scale without any software or hardware changes to BLE devices, and more importantly, shared usage of remote BLE devices over the Internet is also made available.
Increasing number of Internet-scale applications, such as video streaming, incur huge amount of wide area traffic. Such traffic over the unreliable Internet without bandwidth guarantee suffers unpredictable network performance. This result, however, is unappealing to the application providers. Fortunately, Internet giants like Google and Microsoft are increasingly deploying their private wide area networks (WANs) to connect their global datacenters. Such high-speed private WANs are reliable, and can provide predictable network performance. In this paper, we propose a new type of service-inter-datacenter network as a service (iDaaS), where traditional application providers can reserve bandwidth from those Internet giants to guarantee their wide area traffic. Specifically, we design a bandwidth trading market among multiple iDaaS providers and application providers, and concentrate on the essential bandwidth pricing problem. The involved challenging issue is that the bandwidth price of each iDaaS provider is not only influenced by other iDaaS providers, but also affected by the application providers. To address this issue, we characterize the interaction between iDaaS providers and application providers using a Stackelberg game model, and analyze the existence and uniqueness of the equilibrium. We further present an efficient bandwidth pricing algorithm by blending the advantage of a geometrical Nash bargaining solution and the demand segmentation method. For comparison, we present two bandwidth reservation algorithms, where each iDaaS provider's bandwidth is reserved in a weighted fair manner and a max-min fair manner, respectively. Finally, we conduct comprehensive trace-driven experiments. The evaluation results show that our proposed algorithms not only ensure the revenue of iDaaS providers, but also provide bandwidth guarantee for application providers with lower bandwidth price per unit.
Resource scheduling in a computing system addresses the problem of packing tasks with multi-dimensional resource requirements and non-functional constraints. The exhibited heterogeneity of workload and server characteristics in Cloud-scale or Internet-scale systems is adding further complexity and new challenges to the problem. Compared with,,,, existing solutions based on ad-hoc heuristics, Machine Learning (ML) has the potential to improve further the efficiency of resource management in large-scale systems. In this paper we,,,, will describe and discuss how ML could be used to understand automatically both workloads and environments, and to help to cope with scheduling-related challenges such as consolidating co-located workloads, handling resource requests, guaranteeing application's QoSs, and mitigating tailed stragglers. We will introduce a generalized ML-based solution to large-scale resource scheduling and demonstrate its effectiveness through a case study that deals with performance-centric node classification and straggler mitigation. We believe that an MLbased method will help to achieve architectural optimization and efficiency improvement.
We consider the scenario where a cloud service provider (CSP) operates multiple geo-distributed datacenters to provide Internet-scale service. Our objective is to minimize the total electricity and bandwidth cost by jointly optimizing electricity procurement from wholesale markets and geographical load balancing (GLB), i.e., dynamically routing workloads to locations with cheaper electricity. Under the ideal setting where exact values of market prices and workloads are given, this problem reduces to a simple linear programming and is easy to solve. However, under the realistic setting where only distributions of these variables are available, the problem unfolds into a non-convex infinite-dimensional one and is challenging to solve. One of our main contributions is to develop an algorithm that is proven to solve the challenging problem optimally, by exploring the full design space of strategic bidding. Trace-driven evaluations corroborate our theoretical results, demonstrate fast convergence of our algorithm, and show that it can reduce the cost for the CSP by up to 20% as compared with baseline alternatives. This paper highlights the intriguing role of uncertainty in workloads and market prices, measured by their variances. While uncertainty in workloads deteriorates the cost-saving performance of joint electricity procurement and GLB, counter-intuitively, uncertainty in market prices can be exploited to achieve a cost reduction even larger than the setting without price uncertainty.
The steady decline of IP transit prices in the past two decades has helped fuel the growth of traffic demands in the Internet ecosystem. Despite the declining unit pricing, bandwidth costs remain significant due to ever-increasing scale and reach of the Internet, combined with the price disparity between the Internet's core hubs versus remote regions. In the meantime, cloud providers have been auctioning underutilized computing resources in their marketplace as spot instances for a much lower price, compared to their on-demand instances. This state of affairs has led the networking community to devote extensive efforts to cloud-assisted networks - the idea of offloading network functionality to cloud platforms, ultimately leading to more flexible and highly composable network service chains.We initiate a critical discussion on the economic and technological aspects of leveraging cloud-assisted networks for Internet-scale interconnections and data transfers. Namely, we investigate the prospect of constructing a large-scale virtualized network provider that does not own any fixed or dedicated resources and runs atop several spot instances. We construct a cloud-assisted overlay as a virtual network provider, by leveraging third-party cloud spot instances. We identify three use case scenarios where such approach will not only be economically and technologically viable but also provide performance benefits compared to current commercial offerings of connectivity and transit providers.
Existing systems allow manufacturers to acquire factory floor data and perform analysis with cloud applications for machine health monitoring, product quality prediction, fault diagnosis and prognosis etc. However, they do not provide capabilities to perform testing of machine tools and associated components remotely, which is often crucial to identify causes of failure. This paper presents a fault diagnosis system in a cyber-physical manufacturing cloud (CPMC) that allows manufacturers to perform diagnosis and maintenance of manufacturing machine tools through remote monitoring and online testing using Machine Tool Communication (MTComm). MTComm is an Internet scale communication method that enables both monitoring and operation of heterogeneous machine tools through RESTful web services over the Internet. It allows manufacturers to perform testing operations from cloud applications at both machine and component level for regular maintenance and fault diagnosis. This paper describes different components of the system and their functionalities in CPMC and techniques used for anomaly detection and remote online testing using MTComm. It also presents the development of a prototype of the proposed system in a CPMC testbed. Experiments were conducted to evaluate its performance to diagnose faults and test machine tools remotely during various manufacturing scenarios. The results demonstrated excellent feasibility to detect anomaly during manufacturing operations and perform testing operations remotely from cloud applications using MTComm.
Digital-Twins simulate physical world objects by creating 'as-is' virtual images in a cyberspace. In order to create a well synchronized digital-twin simulator in manufacturing, information and activities of a physical machine need to be virtualized. Many existing digital-twins stream read-only data of machine sensors and do not incorporate operations of manufacturing machines through Internet. In this paper, a new method of virtualization is proposed to integrate machining data and operations into the digital-twins using Internet scale machine tool communication method. A fully functional digital-twin is implemented in CPMC testbed using MTComm and several manufacturing application scenarios are developed to evaluate the proposed method and system. Performance analysis shows that it is capable of providing data-driven visual monitoring of a manufacturing process and performing manufacturing operations through digital twins over the Internet. Results of the experiments also shows that the MTComm based digital twins have an excellent efficiency.