A Fine-Grained Scheduling Algorithm for Heterogeneous Supercomputing Clusters Based on Graph Convolutional Networks and Proximal Policy Optimization
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33361
A Fine-Grained Scheduling Algorithm for Heterogeneous Supercomputing Clusters Based on Graph Convolutional Networks and Proximal Policy Optimization

Authors: Jiahao Zhou, Lei Wang

Abstract:

In heterogeneous supercomputing clusters, designing an efficient scheduling strategy is pivotal for enhancing both energy efficiency and workflow execution performance. Dynamic allocation and reclamation of computing resources are critical to improving resource utilization. However, existing approaches often rely on fixed resource allocation for jobs prior to execution, retaining these resources until job completion. This static scheduling paradigm fails to account for the dynamic nature of job execution, leading to suboptimal performance. To address these challenges, this paper introduces the Heterogeneous Hierarchical Fine-grained Scheduling algorithm (HeHiFiS), leveraging Graph Convolutional Networks (GCN) and Proximal Policy Optimization (PPO). The proposed algorithm aims to mitigate prolonged workflow completion times and improve resource utilization in heterogeneous supercomputing environments. Specifically, GCNs are employed to extract task dependency features, which are integrated into state representations, while the PPO reinforcement learning algorithm is used to train a scheduling policy. This policy dynamically adjusts scheduling decisions in real-time based on the evolving states of tasks and resources. To evaluate the effectiveness of HeHiFiS, a heterogeneous scheduling simulation platform was developed. Experimental results demonstrate that HeHiFiS, through the incorporation of resource inheritance and intra-task parallelism mechanisms, significantly enhances resource utilization. Compared to existing scheduling algorithms, HeHiFiS achieves over a 50% improvement in both job completion time and response performance metrics, showcasing its efficacy in dynamic and heterogeneous computing environments.

Keywords: Heterogeneous, Dynamic Scheduling, Graph Convolutional Networks, Proximal Policy Optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18

References:


[1] Mikram, H., El Kafhali, S., Saadi, Y.: HEPGA: A new effective hybrid algorithm for scientific workflow scheduling in cloud computing environment 130, 102864.
[2] Chandra, R.: Parallel programming in OpenMP. Morgan kaufmann, San Francisco, CA, United States (2001)
[3] Mikram, H., El Kafhali, S., Saadi, Y.: Processing time performance analysis of scheduling algorithms for virtual machines placement in cloud computing environment. In: International Conference On Big Data and Internet of Things, pp. 200–211. Springer (2022)
[4] Ladosz, P., Weng, L., Kim, M., Oh, H.: Exploration in deep reinforcement learning: A survey. Information Fusion 85, 1–22 (2022).
[5] Wang, X., Wang, S., Liang, X., Zhao, D., Huang, J., Xu, X., Dai, B., Miao, Q.: Deep Reinforcement Learning: A Survey 35(4), 5064–5078.
[6] Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems 32(1), 4–24 (2020)
[7] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms (2017)
[8] Gao, Y., Chen, L., Li, B.: Spotlight: Optimizing Device Placement for Training Deep Neural Networks
[9] Mao, H., Schwarzkopf, M., Venkatakrishnan, S.B., Meng, Z., Alizadeh, M.: Learning scheduling algorithms for data processing clusters. In: Proceedings of the ACM Special Interest Group on Data Communication, pp. 270–288. ACM, Beijing China (2019).
[10] Zhu, Y., Hu, B.: Smart-mDAG: An Intelligent Scheduling Method for Multi-DAG Jobs. In: 2021 International Conference on Information and Communication Technology Convergence (ICTC), pp. 110–115. IEEE, Jeju Island, Korea, Republic of (2021).
[11] Sun, B., Theile, M., Qin, Z., Bernardini, D., Roy, D., Bastoni, A., Caccamo, M.: Edge Generation Scheduling for DAG Tasks Using Deep Reinforcement Learning. IEEE Transactions on Computers 73(4), 1034–1047 (2024).
[12] Liu, J., Pacitti, E., Valduriez, P., Mattoso, M.: A Survey of Data-Intensive Scientific Workflow Management 13(4), 457–493.
[13] Lee, H., Cho, S., Jang, Y., Lee, J., Woo, H.: A Global DAG Task Scheduler Using Deep Reinforcement Learning and Graph Convolution Network 9, 158548–158561.
[14] Wang, X., Zhang, L., Liu, Y., Li, F., Chen, Z., Zhao, C., Bai, T.: Dynamic scheduling of tasks in cloud manufacturing with multi-agent reinforcement learning 65, 130–145.
[15] Alsmadi, M.K., Omar, K.B., Noah, S.A., Almarashdah, I.: Performance Comparison of Multi-layer Perceptron (Back Propagation, Delta Rule and Perceptron) algorithms in Neural Networks. In: 2009 IEEE International Advance Computing Conference, pp. 296–299. IEEE, Patiala, India (2009).
[16] Little, J.D.C., Graves, S.C.: Little’s Law, pp. 81–100. Springer US, Boston, MA (2008).
[17] Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: OpenAI Gym (2016)