Generic Workload Management System Using Condor-Based Pilot Factory in PanDA Framework
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32799
Generic Workload Management System Using Condor-Based Pilot Factory in PanDA Framework

Authors: Po-Hsiang Chiu, Torre Wenaus

Abstract:

In the current Grid environment, efficient workload management presents a significant challenge, for which there are exorbitant de facto standards encompassing resource discovery, brokerage, and data transfer, among others. In addition, the real-time resource status, essential for an optimal resource allocation strategy, is often not readily accessible. To address these issues and provide a cleaner abstraction of the Grid with the potential of generalizing into arbitrary resource-sharing environment, this paper proposes a new Condor-based pilot mechanism applied in the PanDA architecture, PanDA-PF WMS, with the goal of providing a more generic yet efficient resource allocating strategy. In this architecture, the PanDA server primarily acts as a repository of user jobs, responding to pilot requests from distributed, remote resources. Scheduling decisions are subsequently made according to the real-time resource information reported by pilots. Pilot Factory is a Condor-inspired solution for a scalable pilot dissemination and effectively functions as a resource provisioning mechanism through which the user-job server, PanDA, reaches out to the candidate resources only on demand.

Keywords: Condor, glidein, PanDA, Pilot, Pilot Factory.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1071934

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2052

References:


[1] K. Harrison, R.W.L. Jones, D.Liko, C.L. Tan, "Distributed Analysis in the ATLAS Experiment," in Proc. AHM Conf., 2006.
[2] S. Kolos et al., "Online Monitoring software framework in the ATLAS experiment", CHEP 2003, La Jolla, California, USA, 2003.
[3] Akihiko Konagaya, "The Grid as a ÔÇÿBa- for Biomedical Knowledge Creation," Grid Computing in Life Science, LSGRID 2005, pp. 1-10.
[4] W. T. Sullivan, III, D. Werthimer, S. Bowyer, J. Cobb, D. Gedye, D. Anderson. A New Major SETI Project Based on Project SERENDIP Data and 100,000 Personal Computers. Astronomical and Biochemical Origins and the Search for Life in the Universe, Proc. of the Fifth Intl. Conf. on Bioastronomy. 1997.
[5] J. Frey, T. Tannenbaum, M. Livny, "Condor-G: A Computation Management Agent for Multi-Institutional Grid", Cluster Computing, Springer Netherlands, 2004, pp. 237-246.
[6] D. Thain, T. Tannenbaum, and M. Livny. Condor and the Grid. In Grid Computing: Making the Global Infrastructure a Reality. John Wiley & Sons Inc., 2002.
[7] T. T. Douglas Thain and M. Livny. Distributed Computing in Practice: The Condor Experience. Concurrency and Computation: Practice and Experience, 2004.
[8] Papakhian, M. Comparing Job-Management Systems: The User's Perspective. IEEE Computational Science & Engineering, (April-June) 1998. Available: http://pbs.mrj.com
[9] D.P. Anderson. "BOINC: A System for Public-Resource Computing and Storage," 5th IEEE/ACM International Workshop on Grid Computing, Pittsburgh, PA, 2004, pp. 365-372.
[10] Zhou, S. LSF: Load Sharing in Large-Scale Heterogeneous Distributed Systems. Proceedings of the Workshop on Cluster Computing, 1992.
[11] Foster, I. and Kesselman, C. The Globus Project: A Status Report. In Proc. Heterogeneous Computing Workshop, IEEE Press, 1998, pp. 4-18.
[12] P.Nilsson, J.Caballero, K.De, T. Maeno, M.Potekhin and T.Wenaus, "The PanDA system in the ATLAS experiment," ACAT 2008 Conference Proceedings.
[13] Klimentov A., "ATLAS Distributed Data Management Operations. Experience and Projection," Journal of Physics: Conf. Series, 2007.
[14] Nilsson P., "Experience from a Pilot based system for ATLAS, " Journal of Physics: Conference Series, 2008
[15] M. Avvenuti, P. Corsini, P. Masci, A. Vecchio, "Opportunistic Computing for Wireless Sensor Network," IEEE Intl Conf. on Mobile Adhoc and Sensor Systems," 2007, pp. 1-6
[16] Foster, I., Kesselman, C., and Tuecke, S., "The Anatomy of the Grid: Enabling Scalable Virtual Organizations," Intl. J. Supercomputer Applications, 2001
[17] B. DeWin, F. Piessens, W. Joosen, T. Verhanneman, "On The Importance of the Separation-Of-Concerns Principle in Secure Software Engineering," In ACSA Workshop on the Application of Engineering Principles to System Security Design, 2003, pp. 1-10.
[18] Enabling Grids for E-science. Available: www.eu-egee.org
[19] T Maeno, "PanDA: Distributed Production and Distributed Analysis System for ATLAS," Journal of Physics: Conference Series, 2008.
[20] Organization for the Advancement of Structured Information Standards, "Introduction to UDDI: Important Features and Functional Concepts," 2004.
[21] M. Litzkow, M. Livny, and M. Mutka. Condor - A Hunter of Idle Workstations. In Proc. 8th Intl Conf. on Distributed Computing Systems, 1988, pp.104-111.
[22] Jim Basney, Miron Livny, and Todd Tannenbaum, "High Throughput Computing with Condor," HPCU news, Volume 1(2), June 1997.
[23] Rajesh Raman, Miron Livny, and Marvin Solomon, "Matchmaking: Distributed Resource Management for High Throughput Computing," Proc. of the 7th IEEE International. Symposium on High Performance Distributed Computing, July 28-31, 1998, Chicago, IL
[24] gLite, Lightweight Middleware for Grid Computing. Available: http://glite.web.cern.ch/glite/
[25] Condor manual, development release version 7.0. Available: http://www.cs.wisc.edu/condor/manual/
[26] K. Czajkowski, I. Foster, N. Karonis, C. Kesselman, S. Martin, W. Smith, S. Tuecke, "A Resource Management Architecture for Metacomputing Systems," Proc. IPPS/SPDP -98 Workshop on Job Scheduling Strategies for Parallel Processing, 1998.
[27] Open Science Grid. http://www.opensciencegrid.org
[28] A. Tsaregorodtsev, V. Garonne, I. Stokes-Rees, "DIRAC: A Scalable Lightweight Architecture for High Throughput Computing," Fifth IEEE/ACM International Workshop on Grid Computing (GRID'04), 2004, pp.19-25.
[29] Distributed.net: The First General-Purpose Distributed Computing Project. Available: http://www.distributed.net
[30] Derrick Kondo, David P. Anderson and John McLeod VII. "Performance Evaluation of Scheduling Policies for Volunteer Computing," 3rd IEEE International Conference on e-Science and Grid Computing. Bangalore, India, December 10-13, 2007.
[31] CERN Twiki. http://twiki.cern.ch/twiki/bin/view/EGEE/BDII
[32] Igor Sfiligoi. Structural Overview of the GlideinWMS. Available: http://www.uscms.org/SoftwareComputing/Grid/WMS/glideinWMS/
[33] Chiu P, Huber M, "Clustering Similar Actions in Sequential Decision Processes," in Proc. of the 8th Intl Conf. on Machine Learning and Applications (ICMLA'09), Miami Beach, FL. 2009, pp. 776-781.