Commenced in January 2007
Paper Count: 30124
Hierarchical Checkpoint Protocol in Data Grids
Abstract:Grid of computing nodes has emerged as a representative means of connecting distributed computers or resources scattered all over the world for the purpose of computing and distributed storage. Since fault tolerance becomes complex due to the availability of resources in decentralized grid environment, it can be used in connection with replication in data grids. The objective of our work is to present fault tolerance in data grids with data replication-driven model based on clustering. The performance of the protocol is evaluated with Omnet++ simulator. The computational results show the efficiency of our protocol in terms of recovery time and the number of process in rollbacks.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1129648Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 509
 O. Marin, “The darx framework: Adapting fault tolerance for agent systems,” Ph.D. dissertation, Universit´e de Have, 2003.
 B. Hamid, “Distributed fault-tolerance techniques for local computations,” Ph.D. dissertation, Universit´e Bordeaux I, 2007.
 F. Reichenbach, “Service snmp de dtection de faute pour des systmes rpartis,” Ph.D. dissertation, Ecole polytechnique de Lausane, 2002.
 M. Wiesmann, F. Pedone, and A. Schiper, “A systematic classification of replicated database protocols based on atomic broadcast,” in 3rd Europeean Research Seminar on Advances in Distributed Systems, 1999.
 X. Besseron, “Tol´erance aux fautes et reconfiguration dynamique pour les applications distribu´ees `a grande ´echelle,” Ph.D. dissertation, Universit´e de Grenoble, 2010.
 N. M. Ndiaye, “Techniques de gestion des d´e faillances dans les grilles informatiques tol´e rantes aux fautes,” Ph.D. dissertation, Universit´e Pierre et Marie Curie, 2013.
 S. Drapeau, “Un canevas adaptable de services de duplication,” Ph.D. dissertation, Institut National Polytechnique de Grenoble, 2003.
 R. Souli-Jbali, M. S. Hidri, and R. B. Ayed, “Dynamic data replication-driven model in data grids,” in 39th Annual Computer Software and Applications Conference, COMPSAC Workshops 2015, Taichung, Taiwan, July 1-5, 2015, 2015, pp. 393–397.
 Chandy and Lamport, “Distributed snapshots : Determining global states of distributed systems,” ACM Transactions on Computer Systems, vol. 3, no. 1, pp. 63–75, 1985.
 H. S.Paul, A. Gupta, and R. Badrinath, “Hierarchical coordinated checkpointing protocol,” in International Conference on Parallel and Distributed Computing Systems, 2002, pp. 240–245.
 K. Bhatia, K. Marzullo, and L. Alvisi, “Scalable causal message logging for wide-area environments,” Concurrency and Computation: Practice and Experience, vol. 15, no. 3, pp. 243–250, 2003.
 S. Monnet, C. Morin, and R. Badrinath, “Hybrid checkpointing for parallel applications in cluster federations,” in 3rd Workshop on Resiliency in High Performance Computing (Resilience) in Clusters, Clouds, and Grids, 2004, pp. 773–782.
 E. Meneses, C. L. Mendes, and L. V. Kale, “Team based message logging : Preliminary results,” in 4th IEEE ACM International Symposium on Cluster Computing and the Grid, 2010.
 J.-M. Yang, K. Li, W.-W. Li, and D.-F. Zhang, “Trading off logging overhead and coordinating overhead to achieve efficient rollback recovery,” Concurrency and Computation: Practice and Experience, vol. 21, no. 3, pp. 819–853, 2009.
 A. Guermouche, “Nouveaux protocoles de tolrance aux fautes pour les applications du calcul haute performance,” Ph.D. dissertation, Universit´e Paris-Sud, 2011.
 D. B. Johnson and W. Zwaenepoel, “Sender based message logging,” in The Seventeenth Annual International Symposium on Fault-Tolerant Computing, 1987, pp. 14–19.
 A. Varga and R. Hornig, “An overview of the omnet++ simulation environment,” in Proceedings of the 1st International Conference on Simulation Tools and Techniques for Communications, Networks and Systems & Workshops, 2008, pp. 60:1–60:10.