Using Multi-Arm Bandits to Optimize Game Play Metrics and Effective Game Design
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32870
Using Multi-Arm Bandits to Optimize Game Play Metrics and Effective Game Design

Authors: Kenny Raharjo, Ramon Lawrence


Game designers have the challenging task of building games that engage players to spend their time and money on the game. There are an infinite number of game variations and design choices, and it is hard to systematically determine game design choices that will have positive experiences for players. In this work, we demonstrate how multi-arm bandits can be used to automatically explore game design variations to achieve improved player metrics. The advantage of multi-arm bandits is that they allow for continuous experimentation and variation, intrinsically converge to the best solution, and require no special infrastructure to use beyond allowing minor game variations to be deployed to users for evaluation. A user study confirms that applying multi-arm bandits was successful in determining the preferred game variation with highest play time metrics and can be a useful technique in a game designer's toolkit.

Keywords: Game design, multi-arm bandit, design exploration and data mining, player metric optimization and analytics.

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1497


[1] T. L. Lai and H. Robbins, "Asymptotically efficient adaptive allocation rules," Advances in Applied Mathematics, vol. 6, no. 1, pp. 4-22, 1985.
[2] Entertainment Software Association, "Industry Facts", Retrieved August 2016 from
[3] C. J. Watkins, "Learning from delayed rewards"¸ Ph.D. thesis, University of Cambridge, 1989.
[4] J. Vermorel and M. Mohri, "Multi-armed bandit algorithms an empirical evaluation", European Conference on Machine Learning, Springer, pp. 437-448, 2005.
[5] P. Auer, N. Cesa-Bianchi, and P. Fischer, "Finite-time analysis for the multiarmed bandit problem," Machine Learning, vol. 47, no. 2-3, pp. 235-256, 2002.
[6] L. Zhou, "A survey on contextual multi-armed bandits", arXiv preprint arXiv:1508.00326.
[7] G. Burtini, J. Loeppky, and R. Lawrence, "Improving online marketing experiments with drifting multi-armed bandits,", ICEIS 2015 – 17th International Conference on Enterprise Information Systems, pp. 630-626, 2015.
[8] A. J. Ramirez and V. Bulitko, "Automated Planning and Player Modeling for Interactive Storytelling," IEEE Transactions on Computer Intelligence and AI in Games, vol. 7, no. 4, pp. 275-286, 2015.
[9] C. H. Tan, K. C. Tan, and A. Tay, "Dynamic Game Difficulty Scaling Using Adaptive Behavior-Based AI," IEEE Transactions on Computer Intelligence and AI in Games, vol. 3, no. 4, pp. 289-301, 2011.
[10] A. Garivier, E. Kaufmann, W. M. Koolen, "Maximin Action Identification: A new Bandit Framework for Games", 29th Annual Conference on Learning Theory, pp. 1028–1050, 2016.
[11] S. Ontañón, "The Combinatorial Multi-Armed Bandit Problems and Its Application to Real-Time Strategy Games", AIIDE 2013 - Ninth Artificial Intelligence and Interactive Digital Entertainment Conference, pp. 58-64, 2013.
[12] Diamond Hunter. Retrieved August 2016 from