A State Aggregation Approach to Singularly Perturbed Markov Reward Processes
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32771
A State Aggregation Approach to Singularly Perturbed Markov Reward Processes

Authors: Dali Zhang, Baoqun Yin, Hongsheng Xi

Abstract:

In this paper, we propose a single sample path based algorithm with state aggregation to optimize the average rewards of singularly perturbed Markov reward processes (SPMRPs) with a large scale state spaces. It is assumed that such a reward process depend on a set of parameters. Differing from the other kinds of Markov chain, SPMRPs have their own hierarchical structure. Based on this special structure, our algorithm can alleviate the load in the optimization for performance. Moreover, our method can be applied on line because of its evolution with the sample path simulated. Compared with the original algorithm applied on these problems of general MRPs, a new gradient formula for average reward performance metric in SPMRPs is brought in, which will be proved in Appendix, and then based on these gradients, the schedule of the iteration algorithm is presented, which is based on a single sample path, and eventually a special case in which parameters only dominate the disturbance matrices will be analyzed, and a precise comparison with be displayed between our algorithm with the old ones which is aim to solve these problems in general Markov reward processes. When applied in SPMRPs, our method will approach a fast pace in these cases. Furthermore, to illustrate the practical value of SPMRPs, a simple example in multiple programming in computer systems will be listed and simulated. Corresponding to some practical model, physical meanings of SPMRPs in networks of queues will be clarified.

Keywords: Singularly perturbed Markov processes, Gradient of average reward, Differential reward, State aggregation, Perturbed close network.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1327714

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1577

References:


[1] P. Marbach and J.N. Tsitsiklis "Simulation-based optimization of Markov reward processes, " IEEE Transactions on Automatic Control, Vol. 46, No.2, February, 2001, pp.191-209.
[2] G. Yin and Q. Zhang "Singularly perturbed discrete-Time Markov chains," SIAM Journal Appl. Math. Vol.61, No.3, 2 000, pp 834-854.
[3] M. Abbad and J.A. Filar "Perturbation and Stability Theory for Markov Control Problems," IEEE Transactions on Automatic Control, Vol.37, No.9, September, 1992, pp 1415-1420.
[4] M. Abbad and J. A. Filar "Algorithms for singularly perturbed limiting average Markov Control Problem," IEEE Transactions on Automatic Control, Vol.37, No.9, September, 1992, pp 1421-1425.
[5] G. Yin Q. Zhang and G. Badowski "Discrete-time singularly perturbed Markov chain: Aggregation, Occupation measures, and switching diffusion limit," Adv. Appl. Prob.Vol.35, 2003, pp 449-476.
[6] F. Delebecque "A reduction process for perturbed Markov chains," SIAM Journal Appl. Math. Vol. 43, No.2, April 1983.
[7] R.H. Liu, Q. Zhang and G. Yin "Singularly perturbed Markov decision processes in discrete time," Decision and Control, 2001. Proceedings of the 40th IEEE Conference on. Vol. 3, 4-7 December. 2001, pp 2119 - 2124.
[8] M. Abbad, J.A. Filar, and T.R. Bielecki, "Singularly perturbed Markov control problem: limiting average cost," Decision and Control, 1989. Proceedings of the 28th IEEE Conference on. Vol. 2, 4-7 December. 1989- pp. 1263 - 1266.
[9] Xi-ren Cao, "The potential structure of sample paths and performance sensitivities of Markov systems The potential structure of sample paths and performance sensitivities of Markov systems," IEEE Transactions on Automatic Control, Vol. 49, Issue,12 , December. 2004 pp. 2129 - 2142.
[10] H. Fang and Xi-ren Cao "Potential-based online policy iteration algorithms for Markov decision processes," IEEE Transactions on Automatic Control, Vol. 49 , Issue, 4 , April. 2004, pp. 493 - 505.
[11] H. Tang, H. Xi and B. Yin. "Performance optimization of continuous time Markov control processes based on performance potentials,". Int. Journal of Systems Science, 2003, Vol. 34(1), pp, 63-71.
[12] D. Zhang, H. Xi and B. Yin "Simulation-based optimization for singularly perturbed Markov reward process with aggregated states," Int. Conference of Intelligence Computing 2005. Accepted.
[13] J. Ledoux "On weak lump ability of denumerable Markov chains," Statistics & Probability Letters, Vol. 25, 1995, pp, 329-339.
[14] J.R. Jackson. "Job shop-like queuing systems," Man. Sci. Vol. 9 October, 1963, pp, 131-142.
[15] P.J. Courtois. "On the near-complete-decomposability of networks of queues and of stochastic models of multiprogramming computing system". Scientific Rep. CMU-CS-72-11, Carnegie-Mellon U, November, 1971