Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33156
A Modular On-line Profit Sharing Approach in Multiagent Domains
Authors: Pucheng Zhou, Bingrong Hong
Abstract:
How to coordinate the behaviors of the agents through learning is a challenging problem within multi-agent domains. Because of its complexity, recent work has focused on how coordinated strategies can be learned. Here we are interested in using reinforcement learning techniques to learn the coordinated actions of a group of agents, without requiring explicit communication among them. However, traditional reinforcement learning methods are based on the assumption that the environment can be modeled as Markov Decision Process, which usually cannot be satisfied when multiple agents coexist in the same environment. Moreover, to effectively coordinate each agent-s behavior so as to achieve the goal, it-s necessary to augment the state of each agent with the information about other existing agents. Whereas, as the number of agents in a multiagent environment increases, the state space of each agent grows exponentially, which will cause the combinational explosion problem. Profit sharing is one of the reinforcement learning methods that allow agents to learn effective behaviors from their experiences even within non-Markovian environments. In this paper, to remedy the drawback of the original profit sharing approach that needs much memory to store each state-action pair during the learning process, we firstly address a kind of on-line rational profit sharing algorithm. Then, we integrate the advantages of modular learning architecture with on-line rational profit sharing algorithm, and propose a new modular reinforcement learning model. The effectiveness of the technique is demonstrated using the pursuit problem.Keywords: Multi-agent learning; reinforcement learning; rationalprofit sharing; modular architecture.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1333911
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1455References:
[1] Panait L, Luke S, "Cooperative Multi-Agent Learning: The state of the art." Autonomous Agents and Multi-Agent Systems, 2005, 11(3): 387-434
[2] Ho F, Kamel M. "Learning coordinating strategies for cooperative multiagent systems." Machine Learning, 1998, 33(2-3): 155-177,
[3] Garland A, Alterman R. "Autonomous agents that learn to better coordinate." Autonomous Agents and Multi-Agent System, 2004, 8(3): 267-301
[4] Kaelbing L P, Littman M L, Moore A W. "Reinforcement learning: A survey." Journal of Artificial Research, 1996, 4: 237-285
[5] Sutton R S, Barto A G. Reinforcement learning: An introduction. Cambridge, MA: MIT Press, 1998
[6] Excelente-Toledo CB, Jennings NR. "Using reinforcement learning to coordinate better." Computational Intelligence, Vol. 21 No. 3, pp. 217-245. Blackwell Publishing 2005
[7] CHEN G, YANG ZH. "Coordinating Multiple Agents via Reinforcement Learning." Autonomous Agents and Multi-Agent Systems, 2005, 10(3): 273-328
[8] Ono N, Fukumoto K. "Multi-agent reinforcement learning: A modular approach." In Proceedings of the Second International Conference on Multi-agent Systems. Portland, Oregon, USA. 1996, pp: 252-258, AAAI Press
[9] Miyazaki K, Yamamura M, Kobayashi S. "On the rationality of profit sharing in reinforcement learning." In Proceedings of the third International Conference on Fuzzy Logic, Neural Nets and Soft Computing, pages 285-288. Fuzzy Logic Systems Institute, 1994
[10] Arai S, Sycara K. "Effective learning approach for planning and scheduling in multi-agent domain." In Proceedings of the 6th International Conference on Simulation of Adaptive Behavior. Paris, France. September 2000, pp: 507-516
[11] Arai S, Sycara K P, Payne T R. "Experience-based reinforcement learning to acquire effective behavior in a multi-agent domain." In Proceedings of the 6th Pacific Rim International Conference on Artificial Intelligence. Melbourne, Australia. 2000, pp: 125-135
[12] Bellman R. Dynamic programming. Princeton, NJ: Princeton University Press, 1957
[13] Watkins C J, Dayan P. "Technical Note: Q-learning." Machine learning, 1992, 8: 279-292
[14] Whitehead S, Karlsson J, Tenenberg J. "Learning multiple goal behavior via task decomposition and dynamic policy merging." Robot Learning, Norwell, MA: Kluwer Academic Press, 1993
[15] Grefenstette J J. "Credit assignment in rule discovery systems based on genetic algorithms." Machine Learning, 1988, 3: 225-245
[16] Miyazaki K, Kobayashi S. "On the rationality of profit sharing in partially observable markov decision process." In Proceedings of the fifth International Conference on Information Systems Analysis and Synthesis. Orlando, FL, USA. 1999, pp: 190-197
[17] Whitehead S D, Balland D H. Active perception and reinforcement learning. In Proceedings of 7th International Conference on Machine Learning. 1990, pp: 162-169
[18] Singh S P, Sutton R S. "Reinforcement learning with replacing eligibility traces." Machine Learning, 1996, 22: 123-158
[19] Benda M, Jagannathan V, Dodhiawalla R. "On optimal cooperation of knowledge source." Technical Report No. BCS-G2010-28, Boeing Advanced Technology Center, Boeing Computer Services, Seattle, WA, 1986