Intelligent Resource Scheduling at Scale: A Machine Learning Perspective
Title | Intelligent Resource Scheduling at Scale: A Machine Learning Perspective |
Publication Type | Conference Paper |
Year of Publication | 2018 |
Authors | Yang, R., Ouyang, X., Chen, Y., Townend, P., Xu, J. |
Conference Name | 2018 IEEE Symposium on Service-Oriented System Engineering (SOSE) |
Date Published | March 2018 |
Publisher | IEEE |
ISBN Number | 978-1-5386-5207-7 |
Keywords | ad-hoc heuristics, cloud computing, cloud-scale, Collaboration, composability, data centers, exhibited heterogeneity, Human Behavior, human factors, intelligent resource scheduling, Internet-scale Computing Security, Internet-scale systems, large-scale resource scheduling, Large-scale systems, learning (artificial intelligence), machine learning, Metrics, ML, multidimensional resource requirements, nonfunctional constraints, performance-centric node classification, Policy Based Governance, Processor scheduling, pubcrawl, quality of service, resilience, Resiliency, resource allocation, Resource management, Resource Scheduling, Scalability, scheduling, server characteristics, Servers, straggler, straggler mitigation, Task Analysis, workload |
Abstract | Resource scheduling in a computing system addresses the problem of packing tasks with multi-dimensional resource requirements and non-functional constraints. The exhibited heterogeneity of workload and server characteristics in Cloud-scale or Internet-scale systems is adding further complexity and new challenges to the problem. Compared with,,,, existing solutions based on ad-hoc heuristics, Machine Learning (ML) has the potential to improve further the efficiency of resource management in large-scale systems. In this paper we,,,, will describe and discuss how ML could be used to understand automatically both workloads and environments, and to help to cope with scheduling-related challenges such as consolidating co-located workloads, handling resource requests, guaranteeing application's QoSs, and mitigating tailed stragglers. We will introduce a generalized ML-based solution to large-scale resource scheduling and demonstrate its effectiveness through a case study that deals with performance-centric node classification and straggler mitigation. We believe that an MLbased method will help to achieve architectural optimization and efficiency improvement. |
URL | https://ieeexplore.ieee.org/document/8359158 |
DOI | 10.1109/SOSE.2018.00025 |
Citation Key | yang_intelligent_2018 |
- Resource Scheduling
- performance-centric node classification
- Policy Based Governance
- Processor scheduling
- pubcrawl
- quality of service
- resilience
- Resiliency
- resource allocation
- resource management
- nonfunctional constraints
- Scalability
- scheduling
- server characteristics
- Servers
- straggler
- straggler mitigation
- Task Analysis
- workload
- Internet-scale Computing Security
- Cloud Computing
- cloud-scale
- collaboration
- composability
- data centers
- exhibited heterogeneity
- Human behavior
- Human Factors
- intelligent resource scheduling
- ad-hoc heuristics
- Internet-scale systems
- large-scale resource scheduling
- Large-scale systems
- learning (artificial intelligence)
- machine learning
- Metrics
- ML
- multidimensional resource requirements