Implementation of Watch Dog Timer for Fault Tolerant Computing on Cluster Server
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32795
Implementation of Watch Dog Timer for Fault Tolerant Computing on Cluster Server

Authors: Meenakshi Bheevgade, Rajendra M. Patrikar

Abstract:

In today-s new technology era, cluster has become a necessity for the modern computing and data applications since many applications take more time (even days or months) for computation. Although after parallelization, computation speeds up, still time required for much application can be more. Thus, reliability of the cluster becomes very important issue and implementation of fault tolerant mechanism becomes essential. The difficulty in designing a fault tolerant cluster system increases with the difficulties of various failures. The most imperative obsession is that the algorithm, which avoids a simple failure in a system, must tolerate the more severe failures. In this paper, we implemented the theory of watchdog timer in a parallel environment, to take care of failures. Implementation of simple algorithm in our project helps us to take care of different types of failures; consequently, we found that the reliability of this cluster improves.

Keywords: Cluster, Fault tolerant, Grid, Grid ComputingSystem, Meta-computing.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1334061

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2171

References:


[1] Ian Foster and A. Iamnitchi,"A problem -Specific Fault-Tolerance Mechanism for Asynchronous, Distributed Systems", IEEE, p.4-13 2000.
[2] Ian Foster, C. Kesselman, Craig Lee, G.v.Lazzewski,,"A Fault Detection Service for Wide Area Distributed Computations", Cluster Computing, v.2 n.2,p.117-128, 1999.
[3] Sriram Rao, Lorenzo Alvisi, Harrick M.Vin , "Egida : An Extensible Toolkit For Low-overhead Fault-Tolerance, Fault-Tolerant Computing", Digest of Papers. Twenty-Ninth Annual International Symposium, p. 45- 55, 1999.
[4] Paul Toenend and Jie Xu, "Replication-based Fault-Tolerance in a Grid Environment", citeceer, 2003.
[5] Pascal Felber, Proya Narasimhan, Member, IEEE, "Experiences, Strategies, and Challenges in Building Fault-Tolerant CORBA Systems", IEEE transactions on Computers , Vol.53, NO.5, May 2004.
[6] Object Management Group, "Fault Tolerant CORBA (Final Adopted Specification)" CMG Technical Committee Document formal/01-12- 29.,Dec., 2001.
[7] R.Friedman and E.Hadad, "FTS: A High Performance CORBA Fault Tolerance Service", Proc. IEEE Workshop Object Oriented Real-time Dependable Systems., Jan. 2002.
[8] Jack G. Ganssle, "Great Watchdogs", V-1.2, Gaanssel Group, updated January, 2004.
[9] http://en.wikipedia.org/wiki/Watchdog_timer
[10] http://en.wikipedia.org/wiki/graceful degradation