SWARM: A Meta-Scheduler to Minimize Job Queuing Times on Computational Grids
Authors: Jean-Alain Grunchec, Jules Hernández-Sánchez, Sara Knott
Abstract:
Some meta-schedulers query the information system of individual supercomputers in order to submit jobs to the least busy supercomputer on a computational Grid. However, this information can become outdated by the time a job starts due to changes in scheduling priorities. The MSR scheme is based on Multiple Simultaneous Requests and can take advantage of opportunities resulting from these priorities changes. This paper presents the SWARM meta-scheduler, which can speed up the execution of large sets of tasks by minimizing the job queuing time through the submission of multiple requests. Performance tests have shown that this new meta-scheduler is faster than an implementation of the MSR scheme and the gLite meta-scheduler. SWARM has been used through the GridQTL project beta-testing portal during the past year. Statistics are provided for this usage and demonstrate its capacity to achieve reliably a substantial reduction of the execution time in production conditions.
Keywords: Grid computing, multiple simultaneous requests, fault tolerance, GridQTL.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1327764
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1912References:
[1] I. Foster and C. Kesselman, "The Grid: Blueprint for a New Computing Infrastructure," 2nd ed., Ed. Los Altos: Morgan-Kaufman, 2004.
[2] I. Foster, "Globus toolkit version 4: software for service-oriented systems," in Proc. Conf. on Network and Parallel Computing, Beijing, China, Nov.-Dec. 2005, pp. 2-13.
[3] J. Novotny, M. Russel and O. Wehren, "GridSphere: a portal framework for building collaborations," Concurrency and Computation: Practice and Experience, vol. 16, no. 5, pp. 503-513, Mar. 2004.
[4] G. Seaton, J. Hernández-Sánchez, J.-A. Grunchec, I. White, J. Allen, D.- J. De Koning, W. Wei, D. Berry, C. Haley and S. Knott, "GridQTL: A Grid Portal for QTL Mapping of Compute Intensive Datasets," in Proc. 8th World Congress on Genetics Applied to Livestock Production, Belo Horizonte, Brazil, Aug. 2006.
[5] M. Lynch and J. Walsh, "Genetics and Analysis of Quantitative Traits," Sunderland, MA: Sinauer Associates, 1998.
[6] T. Meuwissen, A. Karlsen, S. Lien, I. Olsaker and M. Goddard, "Fine mapping of a quantitative trait locus for twinning rate using combined linkage and linkage disequilibrium mapping," J. Genetics, vol. 161, no. 1, pp. 373-379, May 2002.
[7] The UK National Grid Service (Online). Available: http://www.gridsupport. ac.uk
[8] The Edinburgh Compute and Data Facility (Online). Available: http://www.ecdf.ed.ac.uk/index.shtml
[9] The Condor Project (Online). http://www.cs.wisc.edu/condor
[10] V. Subramani, R. Kettimuthu, S. Srinivasan and P. Sadayappan, "Distributed job scheduling on computational grids using multiple simultaneous requests," in Proc. 11th IEEE Int. Symposium on High Performance Distributed Computing, Edinburgh, UK, Jul. 2002, pp. 359-368.
[11] The Java Servlet Technology (Online). Available: http://java.sun.com/products/servlet/index.jsp
[12] The NGS gLite Resource Broker tutorial (Online). Available: http://wiki.ngs.ac.uk/index.php?title=Resource_Broker_Tutorial
[13] E. Laure, E. Fisher, S. Fisher, A. Frohner, C. Grandi and P. Kunszt, "Programming the Grid with gLite," Computational Methods in Science and Technology, vol. 12, no. 1, pp. 33-45, 2006.
[14] G. Gagliardi, "The EGEE European Grid infrastructure project," in Proc. 6th Int. Conf. High Performance Computing for Computational Science, Valencia, Spain, Jun. 2004, pp. 194-203.
[15] Job sumission into the LHC Grid (Job Management + JDL ) (Online). Available: http://www.egee.hu/grid06/download/day_1/05_EGEE_job_execution_a nd_JDL.ppt
[16] The LDLA beta testing portal (Online). Available: http://cleopatra.cap.ed.ac.uk/gridsphere/gridsphere
[17] Apache Tomcat (Online). Available: http://tomcat.apache.org
[18] J. Garret, "Ajax: A new approach to web applications", Adaptive path, 2005 (Online). Available: http://www.adaptivepath.com/publications/essays/archives/000385.php
[19] R. Buyya, D. Abramson and J. Giddy, "Nimrod/G: An architecture for a resource management and scheduling system in a global computational Grid," in Proc. 4th Int. Conf. on High Performance Computing in Asia- Pacific Region, Beijing, China, May 2000, pp. 283-289.
[20] F. Casanova, G. Obertelli, F. Berman and R. Wolski, "The AppLeS parameter sweep template: user-level middleware for the Grid," in Proc. Super Computing 2000, Dallas, Texas, Nov. 2000.
[21] D. Abramson, J. Giddy and L. Kotler, "High performance parametric modeling with Nimrod/G: Killer application for the global Grid?," in Proc. 14th Int. Parallel and Distributed Processing Symposium, Cancun, Mexico, May 2000, pp. 520-528.
[22] S. Venugopal, R. Buyya and L. Winton, "A grid service broker for scheduling distributed data-oriented applications on global Grids," in Proc. 2nd Int. Workshop on Middleware for Grid computing, Toronto, Canada, Oct. 2004, pp. 75-80.
[23] D. Abramson, R. Buyya and J. Gidd, "A computational economy for grid computing and its implementation in the Nimrod-G resource broker," Future Generation Computer Systems, vol. 18, no. 8, pp. 1061-1074, Oct.2002.
[24] B. Beeson, S. Melnikoff, S. Venugopal and D. Barnes, "A portal for grid-enabled physics," in Proc. 2005 Australasian workshop on Grid computing and e-research - volume 44, Newcastle, Australia, Jan.-Feb. 2005, pp. 13-20.
[25] T. Suzumara, H. Nakada, S. Matsuoka and H. Casanova, "GridSpeed: a Web-based Grid portal generation server," in Proc. 7th Int. Conf. on High Performance Computing and Grid in Asia Pacific Region, Tokyo, Japan, Jul. 2004, pp. 26-33.
[26] J. Frey, T. Tannenbaum, I. Foster and S. Tuecke, "Condor-G: a computation management agent for multi-institutional grids," Cluster Computing, vol. 5, no. 3, pp. 237-246, 2004, Jul. 2002.
[27] The GridQTL portal (Online). Available: http://www.gridqtl.org.uk