On Mobile Checkpointing using Index and Time Together
Authors: Awadhesh Kumar Singh
Abstract:
Checkpointing is one of the commonly used techniques to provide fault-tolerance in distributed systems so that the system can operate even if one or more components have failed. However, mobile computing systems are constrained by low bandwidth, mobility, lack of stable storage, frequent disconnections and limited battery life. Hence, checkpointing protocols having lesser number of synchronization messages and fewer checkpoints are preferred in mobile environment. There are two different approaches, although not orthogonal, to checkpoint mobile computing systems namely, time-based and index-based. Our protocol is a fusion of these two approaches, though not first of its kind. In the present exposition, an index-based checkpointing protocol has been developed, which uses time to indirectly coordinate the creation of consistent global checkpoints for mobile computing systems. The proposed algorithm is non-blocking, adaptive, and does not use any control message. Compared to other contemporary checkpointing algorithms, it is computationally more efficient because it takes lesser number of checkpoints and does not need to compute dependency relationships. A brief account of important and relevant works in both the fields, time-based and index-based, has also been included in the presentation.
Keywords: Checkpointing, forced checkpoint, mobile computing, recovery, time-coordinated.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1080456
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1491References:
[1] K. M. Chandy and L. Lamport, "Distributed snapshots: determining global states of distributed systems," ACM Transactions on Computer Systems, 1985, February, vol. 3, no. 1, pp. 63-75.
[2] B. Gupta, S. Rahimi, R. Rias, and G. Bangalore, "A low-overhead nonblock checkpointing algorithm for mobile computing environment," in Lecture Notes in Computer Science 3947, GPC 2006, Springer-Verlag, 2006, pp. 597-608.
[3] N. Neves and W. Fuchs, "Adaptive recovery for mobile environments," in Proc. IEEE High-Assurance Systems Engineering Workshop, October 21-22, 1996, pp.134-141. Also in Communications of the ACM, 1997, January, vol. 40, no. 1, pp. 68-74.
[4] B. Randell, "System structure for software fault-tolerance," IEEE Transactions on Software Engineering, 1975, June, vol. 1, no. 2, pp. 220-232.
[5] J. Helary, A. Mostefaoui, R. Netzer, and M. Raynal, "Communicationbased prevention of useless checkpoints in distributed computations," Distributed Computing, 2000, January, vol. 13, no. 1, pp. 29-43.
[6] J. Tsai, "An efficient index-based checkpointing protocol with constantsize control information on messages," IEEE Transactions on Dependable and Secure Computing, 2005, Oct-Dec, vol. 2, no. 4, pp. 278-296.
[7] R. Baldoni, F. Quaglia, and P. Fornara, "An index-based checkpointing algorithm for autonomous distributed systems," in Proc. 16th IEEE Symposium on Reliable Distributed Systems, October 22-24, 1997, pp.27-34. Also in IEEE Transactions on Parallel and Distributed Systems, 1999, February, vol. 10, no. 2, pp. 181-192.
[8] L. Lamport, "Time, clocks and the ordering of events in a distributed system," Communications of the ACM, 1978, July, vol. 21, no. 7, pp. 558-565.
[9] L. Alvisi, E. Elnozahy, S. Rao, S. A. Husain, and A. D. Mel, "An analysis of communication-induced checkpointing," in Proc. IEEE Fault-Tolerant Computing Symposium, June 15-18, 1999, pp. 242-249.
[10] N. Neves, "Time-based coordinated checkpointing," Ph.D. dissertation, UIUCDCS-R-98-2054, University of Illinois at Urbana-Champaign, 1998.
[11] N. Vaidya, "Another two-level failure recovery scheme: performance impact of checkpoint placement and checkpoint latency," Tech. Rep. 94- 068 (revised), Texas A&M University, January, 1995.
[12] N. Vaidya, "On checkpoint latency," in Proc. Pacific Rim International Symposium on Fault-Tolerant Systems, December, 1995, pp. 60-65.
[13] M. Chaoguang, Z. Yunlong, and Y. Wenbin, "A two-phase time-based consistent checkpointing strategy," in Proc. ITNG-06 3rd IEEE International Conference on Information Technology: New Generations, April 10-12, 2006, pp. 518-523.
[14] Z. Tong, R. Kain, and W. Tsai, "A low overhead checkpointing and rollback recovery scheme for distributed systems," in Proc. 8th IEEE Symposium on Reliable distributed systems, October 10-12, 1989, pp. 12-20.
[15] F. Cristian and F. Jahanian, "A timestamp-based checkpointing protocol for long-lived distributed computation," in Proc. 10th IEEE Symposium on Reliable Distributed Systems, 30 Sep-02 Oct, 1991, pp. 12-20.
[16] N. Neves and W. Fuchs, "Using time to improve the performance of coordinated checkpointing," in Proc. 2nd IEEE International Computer Performance and Dependability Symposium, September 4-6, 1996, pp. 282-291.
[17] S. George, I. Chen, and Y. Jin, "Movement-based checkpointing and logging for recovery in mobile computing systems," in Proc. MobiDE-06 5th ACM International Workshop on Data Engineering for Wireless and Mobile Access, June, 2006, pp. 51-58.
[18] J. Ahn and C. Hwang, "Low-cost fault-tolerance for mobile nodes in mobile IP based systems," in Proc. IEEE International Conference on Distributed Computing Systems Workshop, April 16-19, 2001, pp. 508- 513.
[19] J. Ahn, S. Min, and C. Hwang, "A causal message logging protocol for mobile nodes in mobile computing systems," Future Generation Computer Systems, 2004, May, vol. 20, no. 4, pp. 663-686.
[20] S. Neogy, A. Sinha, and P. Das, "Checkpoint processing in distributed systems software using synchronized clocks," in Proc. ITCC 2001 IEEE International Conference on Information Technology: Coding and Computing, April 2-4, 2001, pp. 555-559.
[21] S. Neogy, A. Sinha, and P. Das, "Distributed checkpointing using synchronized clocks," in Proc. COMPSAC-02 26th IEEE Annual International Computer Software and Applications Conference, 2002, pp. 199-206.
[22] C. Lin, S. Wang, S. Kuo, and I. Chen, "A low overhead checkpointing protocol for mobile computing systems," in Proc. PRDC-02 Pacific Rim International Symposium on Dependable Computing, December 16-18, 2002, pp. 37-44.
[23] C. Lin, S. Wang, and S. Kuo, "An efficient time-based checkpointing protocol for mobile computing systems over wide area networks," in Lecture Notes in Computer Science 2400, Euro-Par 2002, Springer- Verlag, 2002, pp. 978-982. Also in Mobile Networks and Applications, 2003, vo. 8, no. 6, pp. 687-697.
[24] Z. Tong, R. Kain, and W. Tsai, "Rollback recovery in distributed systems using loosely synchronized clocks," IEEE Transactions on Parallel and Distributed Systems, 1992, March, vol. 3, no. 2, pp, 246- 251.
[25] N. Neves and W. Fuchs, "Coordinated checkpointing without direct coordination," in Proc. IPDS-98 IEEE International Computer Performance and Dependability Symposium, September 7-9, 1998, pp. 23-31.
[26] D. Briatico, A. Ciuffoletti, and L. Simoncini, "A distributed dominoeffect free recovery algorithm," in Proc. 4th IEEE International Symposium on Reliability in Distributed Software and Database Systems, October, 1984, pp. 207-215.
[27] D. Manivannan and M. Singhal, "A low-overhead recovery technique using quasi synchronous checkpointing," in Proc. 16th IEEE International Conference on Distributed Computing Systems, May, 1996, pp. 100-107.
[28] Y. Luo, Y. Min, and D. Zhang, "An improved scheme of index-based checkpointing," in Proc. 11th IEEE Pacific Rim International Symposium on Dependable Computing, December 12-14, 2005, Page(s): 5 pp.
[29] G. Cao and M. Singhal, "Mutable checkpoints: a new checkpointing approach for mobile computing systems," IEEE Transactions on Parallel and Distributed Systems, 2001, February, vol. 12, no. 2, pp. 157-172.