Biblio
Resource scheduling in a computing system addresses the problem of packing tasks with multi-dimensional resource requirements and non-functional constraints. The exhibited heterogeneity of workload and server characteristics in Cloud-scale or Internet-scale systems is adding further complexity and new challenges to the problem. Compared with,,,, existing solutions based on ad-hoc heuristics, Machine Learning (ML) has the potential to improve further the efficiency of resource management in large-scale systems. In this paper we,,,, will describe and discuss how ML could be used to understand automatically both workloads and environments, and to help to cope with scheduling-related challenges such as consolidating co-located workloads, handling resource requests, guaranteeing application's QoSs, and mitigating tailed stragglers. We will introduce a generalized ML-based solution to large-scale resource scheduling and demonstrate its effectiveness through a case study that deals with performance-centric node classification and straggler mitigation. We believe that an MLbased method will help to achieve architectural optimization and efficiency improvement.
We consider the scenario where a cloud service provider (CSP) operates multiple geo-distributed datacenters to provide Internet-scale service. Our objective is to minimize the total electricity and bandwidth cost by jointly optimizing electricity procurement from wholesale markets and geographical load balancing (GLB), i.e., dynamically routing workloads to locations with cheaper electricity. Under the ideal setting where exact values of market prices and workloads are given, this problem reduces to a simple linear programming and is easy to solve. However, under the realistic setting where only distributions of these variables are available, the problem unfolds into a non-convex infinite-dimensional one and is challenging to solve. One of our main contributions is to develop an algorithm that is proven to solve the challenging problem optimally, by exploring the full design space of strategic bidding. Trace-driven evaluations corroborate our theoretical results, demonstrate fast convergence of our algorithm, and show that it can reduce the cost for the CSP by up to 20% as compared with baseline alternatives. This paper highlights the intriguing role of uncertainty in workloads and market prices, measured by their variances. While uncertainty in workloads deteriorates the cost-saving performance of joint electricity procurement and GLB, counter-intuitively, uncertainty in market prices can be exploited to achieve a cost reduction even larger than the setting without price uncertainty.
Through time inference attacks, adversaries fingerprint SDN controllers, estimate switches flow-table size, and perform flow state reconnaissance. In fact, timing a SDN and analyzing its results can expose information which later empowers SDN resource-consumption or saturation attacks. In the real world, however, launching such attacks is not easy. This is due to some challenges attackers may encounter while attacking an actual SDN deployment. These challenges, which are not addressed adequately in the related literature, are investigated in this paper. Accordingly, practical solutions to mitigate such attacks are also proposed. Discussed challenges are clarified by means of conducting extensive experiments on an actual cloud data center testbed. Moreover, mitigation schemes have been implemented and examined in details. Experimental results show that proposed countermeasures effectively block time inference attacks.
In recent time it has become very crucial for the data center networks (DCN) to broaden the system limit to be able to meet with the increasing need of cloud based applications. A decent DCN topology must comprise of numerous properties for low diameter, high bisection bandwidth, ease of organization and so on. In addition, a DCN topology should depict aptness in failure resiliency, scalability, construction and routing. In this paper, we introduce a new Data Center Network topology termed LevelTree built up with several modules grows as a tree topology and each module is constructed from a complete graph. LevelTree demonstrates great topological properties and it beats critical topologies like Jellyfish, VolvoxDC, and Fattree regarding providing a superior worthwhile plan with greater capacity.
As an information hinge of various trades and professions in the era of big data, cloud data center bears the responsibility to provide uninterrupted service. To cope with the impact of failure and interruption during the operation on the Quality of Service (QoS), it is important to guarantee the resilience of cloud data center. Thus, different resilience actions are conducted in its life circle, that is, resilience strategy. In order to measure the effect of resilience strategy on the system resilience, this paper propose a new approach to model and evaluate the resilience strategy for cloud data center focusing on its core part of service providing-IT architecture. A comprehensive resilience metric based on resilience loss is put forward considering the characteristic of cloud data center. Furthermore, mapping model between system resilience and resilience strategy is built up. Then, based on a hierarchical colored generalized stochastic petri net (HCGSPN) model depicting the procedure of the system processing the service requests, simulation is conducted to evaluate the resilience strategy through the metric calculation. With a case study of a company's cloud data center, the applicability and correctness of the approach is demonstrated.
Software-Defined Network (SDN) is a novel architecture created to address the issues of traditional and vertically integrated networks. To increase cost-effectiveness and enable logical control, SDN provides high programmability and centralized view of the network through separation of network traffic delivery (the "data plane") from network configuration (the "control plane"). SDN controllers and related protocols are rapidly evolving to address the demands for scaling in complex enterprise networks. Because of the evolution of modern SDN technologies, production networks employing SDN are prone to several security vulnerabilities. The rate at which SDN frameworks are evolving continues to overtake attempts to address their security issues. According to our study, existing defense mechanisms, particularly SDN-based firewalls, face new and SDN-specific challenges in successfully enforcing security policies in the underlying network. In this paper, we identify problems associated with SDN-based firewalls, such as ambiguous flow path calculations and poor scalability in large networks. We survey existing SDN-based firewall designs and their shortcomings in protecting a dynamically scaling network like a data center. We extend our study by evaluating one such SDN-specific security solution called FlowGuard, and identifying new attack vectors and vulnerabilities. We also present corresponding threat detection techniques and respective mitigation strategies.
Software Defined Networking (SDN) stands to transmute our modern networks and data centers, opening them up into highly agile frameworks that can be reconfigured depending on the requirement. Denial of Service (DoS) attacks are considered as one of the most destructive attacks. This paper, is about DoS attack detection and mitigation using SDN. DoS attack can minimize the bandwidth utilization, leaving the network unavailable for legitimate traffic. To provide a solution to the problem, concept of performance aware Software Defined Networking is used which involves real time network monitoring using sFlow as a visibility protocol. So, OpenFlow along with sFlow is used as an application to fight DoS attacks. Our analysis and results demonstrate that using this technique, DoS attacks are successfully defended implying that SDN has promising potential to detect and mitigate DoS attacks.
This paper addresses the need for standard communication protocols for IoT devices with limited power and computational capabilities. The world is rapidly changing with the proliferation and deployment of IoT devices. This will bring in new communication challenges as these devices are connected to Internet and need to communicate with each other in real time. The paper provides an overview of IoT system architecture and the forthcoming challenges it will bring. There is an urging need to establish standards for communication in the IoT world. With the recent development of new protocols like CoAP, 6LowPAN, IEEE 802.15.4 and Thread in different layers of OSI model, additional challenges also present themselves. Performance and data management is becoming more critical than ever before due to the complexity of connecting raging number of IoT devices. The performance of the systems dealing with IoT devices will require appropriate capacity planning the associated development of data centers. Finally, the paper also presents some reasonable approaches to address the above issues in the IoT world.
As the amount of spatial data gets bigger, organizations realized that it is cheaper and more flexible to keep their data on the Cloud rather than to establish and maintain in-house huge data centers. Though this saves a lot for IT costs, organizations are still concerned about the privacy and security of their data. Encrypting the whole database before uploading it to the Cloud solves the security issue. But querying the database requires downloading and decrypting the data set, which is impractical. In this paper, we propose a new scheme for protecting the privacy and integrity of spatial data stored in the Cloud while being able to execute range queries efficiently. The proposed technique suggests a new index structure to support answering range query over encrypted data set. The proposed indexing scheme is based on the Z-curve. The paper describes a distributed algorithm for answering range queries over spatial data stored on the Cloud. We carried many simulation experiments to measure the performance of the proposed scheme. The experimental results show that the proposed scheme outperforms the most recent schemes by Kim et al. in terms of data redundancy.
Cloud computing, often referred to as simply “the cloud,” is the delivery of on-demand computing resources; everything from applications to data centers over the Internet. Cloud is used not only for storing data, but also the stored data can be shared by multiple users. Due to this, the integrity of cloud data is subject to doubt. Every time it is not possible for user to download all data and verify integrity, so proposed system contain Third Party Auditor (TPA) to verify the integrity of shared data. During auditing, the shared data is kept private from public verifiers, who are able to verify shared data integrity without downloading or retrieving the entire data file. Group signature is used to preserve identity privacy of group members from third party auditor. Privacy preserving is done to ensure that the TPA cannot derive user's data content from the information collected during the auditing process.
Summary form only given. Aadhaar, India's Unique Identity Project, has become the largest biometric identity system in the world, already covering more than 920 million people. Building such a massive system required significant design thinking, aligning to the core strategy, and building a technology platform that is scalable to meet the project's objective. Entire technology architecture behind Aadhaar is based on principles of openness, linear scalability, strong security, and most importantly vendor neutrality. All application components are built using open source components and open standards. Aadhaar system currently runs across two of the data centers within India managed by UIDAI and handles 1 million enrollments a day and at the peak doing about 900 trillion biometric matches a day. Current system has about 8 PB (8000 Terabytes) of raw data. Aadhaar Authentication service, which requires sub-second response time, is already live and can handle more than 100 million authentications a day. In this talk, the speaker, who has been the Chief Architect of Aadhaar since inception, shares his experience of building the system.
Services in modern data center networks pose growing performance demands. However, the widely existed special traffic patterns, such as micro-burst, highly concurrent flows, on-off pattern of flow transmission, exacerbate the performance of transport protocols. In this work, an clean-slate explicit transport control mechanism, called Token Flow Control (TFC), is proposed for data center networks to achieve high link utilization, ultra-low latency, fast convergence, and rare packets dropping. TFC uses tokens to represent the link bandwidth resource and define the concept of effective flows to stand for consumers. The total tokens will be explicitly allocated to each consumer every time slot. TFC excludes in-network buffer space from the flow pipeline and thus achieves zero-queueing. Besides, a packet delay function is added at switches to prevent packets dropping with highly concurrent flows. The performance of TFC is evaluated using both experiments on a small real testbed and large-scale simulations. The results show that TFC achieves high throughput, fast convergence, near zero-queuing and rare packets loss in various scenarios.
In modern parallel storage systems (e.g., cloud storage and data centers), it is important to provide data availability guarantees against disk (or storage node) failures via redundancy coding schemes. One coding scheme is X-code, which is double-fault tolerant while achieving the optimal update complexity. When a disk/node fails, recovery must be carried out to reduce the possibility of data unavailability. We propose an X-code-based optimal recovery scheme called minimum-disk-read-recovery (MDRR), which minimizes the number of disk reads for single-disk failure recovery. We make several contributions. First, we show that MDRR provides optimal single-disk failure recovery and reduces about 25 percent of disk reads compared to the conventional recovery approach. Second, we prove that any optimal recovery scheme for X-code cannot balance disk reads among different disks within a single stripe in general cases. Third, we propose an efficient logical encoding scheme that issues balanced disk read in a group of stripes for any recovery algorithm (including the MDRR scheme). Finally, we implement our proposed recovery schemes and conduct extensive testbed experiments in a networked storage system prototype. Experiments indicate that MDRR reduces around 20 percent of recovery time of the conventional approach, showing that our theoretical findings are applicable in practice.
Cloud technologies are increasingly important for IT department for allowing them to concentrate on strategy as opposed to maintaining data centers; the biggest advantages of the cloud is the ability to share computing resources between multiple providers, especially hybrid clouds, in overcoming infrastructure limitations. User identity federation is considered as the second major risk in the cloud, and since business organizations use multiple cloud service providers, IT department faces a range of constraints. Multiple attempts to solve this problem have been suggested like federated Identity, which has a number of advantages, despite it suffering from challenges that are common in new technologies. The following paper tackles federated identity, its components, advantages, disadvantages, and then proposes a number of useful scenarios to manage identity in hybrid clouds infrastructure.