Deep Reinforcement Learning for Optimal Decision-making in Supply Chains
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32926
Deep Reinforcement Learning for Optimal Decision-making in Supply Chains

Authors: Nitin Singh, Meng Ling, Talha Ahmed, Tianxia Zhao, Reinier van de Pol


We propose the use of Reinforcement Learning (RL) as a viable alternative for optimizing supply chain management, particularly in scenarios with stochasticity in product demands. RL’s adaptability to changing conditions and its demonstrated success in diverse fields of sequential decision-making make it a promising candidate for addressing supply chain problems. We investigate the impact of demand fluctuations in a multi-product supply chain system and develop RL agents with learned generalizable policies. We provide experimentation details for training RL agents and a statistical analysis of the results. We study generalization ability of RL agents for different demand uncertainty scenarios and observe superior performance compared to the agents trained with fixed demand curves. The proposed methodology has the potential to lead to cost reduction and increased profit for companies dealing with frequent inventory movement between supply and demand nodes.

Keywords: Inventory Management, Reinforcement Learning, Supply Chain Optimization, Uncertainty.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 305


[1] T. de Kok, C. Grob, M. Laumanns, S. Minner, J. Rambau, and K. Schade, “A typology and literature review on stochastic multi-echelon inventory models,” European Journal of Operational Research, vol. 269, no. 3, pp. 955–983, 2018.
[2] T. Santoso, S. Ahmed, M. Goetschalckx, and A. Shapiro, “A stochastic programming approach for supply chain network design under uncertainty,” European Journal of Operational Research, vol. 167, no. 1, pp. 96–115, 2005.
[3] D. Silver et al., “Mastering the game of Go with deep neural networks and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016.
[4] A. E. Sallab, M. Abdou, E. Perot, and S. Yogamani, “Deep reinforcement learning framework for autonomous driving,” arXiv preprint arXiv:1704.02532, 2017.
[5] X.-Y. Liu, Z. Xiong, S. Zhong, H. Yang, and A. Walid, “Practical deep reinforcement learning approach for stock trading,” arXiv preprint arXiv:1811.07522, 2018.
[6] F. Stranieri and F. Stella, “A deep reinforcement learning approach to supply chain inventory management,” arXiv: 2204.09603, 2022.
[7] E. E. Kosasih and A. Brintrup, “Reinforcement Learning Provides a Flexible Approach for Realistic Supply Chain Safety Stock Optimisation,” IFAC-PapersOnLine, vol. 55, no. 10, pp. 1539–1544, 2022.
[8] S. Kumabe, S. Shiroshita, T. Hayashi, and S. Maruyama, “Learning General Inventory Management Policy for Large Supply Chain Network,” arXiv preprint arXiv:2204.13378, 2022.
[9] J. C´esar Alves and G. Robson Mateus, “Multi-echelon Supply Chains with Uncertain Seasonal Demands and Lead Times Using Deep Reinforcement Learning,” arXiv e-prints, p. arXiv-2201, 2022.
[10] L. Kemmer, H. von Kleist, D. de Rochebou¨et, N. Tziortziotis, and J. Read, “Reinforcement learning for supply chain optimization,” in European Workshop on Reinforcement Learning, 2018, vol. 14, no. 10.
[11] E. E. Kosasih and A. Brintrup, “Reinforcement Learning Provides a Flexible Approach for Realistic Supply Chain Safety Stock Optimisation,” IFAC-PapersOnLine, vol. 55, no. 10, pp. 1539–1544, 2022.
[12] H. Meisheri et al., ”Using reinforcement learning for a large variable-dimensional inventory management problem,” Adaptive Learning Agents Workshop, AAMAS, 2020.
[13] Z. Peng, Y. Zhang, Y. Feng, T. Zhang, Z. Wu, and H. Su, “Deep reinforcement learning approach for capacitated supply chain optimization under demand uncertainty,” in 2019 Chinese Automation Congress (CAC), 2019, pp. 3512–3517.
[14] A. L. Strehl, L. Li, and M. L. Littman, “Reinforcement Learning in Finite MDPs: PAC Analysis.,” Journal of Machine Learning Research, vol. 10, no. 11, 2009.
[15] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
[16] Y. Wang, H. He, and X. Tan, “Truly proximal policy optimization,” in Uncertainty in Artificial Intelligence, 2020, pp. 113–122.
[17] A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, “Stable-baselines3: Reliable reinforcement learning implementations,” The Journal of Machine Learning Research, vol. 22, no. 1, pp. 12348–12355, 2021.
[18] P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger, “Deep reinforcement learning that matters,” in Proceedings of the AAAI conference on artificial intelligence, 2018, vol. 32, no. 1.
[19] W. E. Hart, J.-P. Watson, and D. L. Woodruff, “Pyomo: modeling and solving mathematical programs in Python,” Mathematical Programming Computation, vol. 3, pp. 219–260, 2011.
[20] G. B. Dantzig, “Linear programming,” Operations research, vol. 50, no. 1, pp. 42–47, 2002.
[21] R. Nian, J. Liu, and B. Huang, “A review on reinforcement learning: Introduction and applications in industrial process control,” Computers & Chemical Engineering, vol. 139, p. 106886, 2020.
[22] G. Dalal, K. Dvijotham, M. Vecerik, T. Hester, C. Paduraru, and Y. Tassa, “Safe exploration in continuous action spaces,” arXiv preprint arXiv:1801.08757, 2018.
[23] E. Pan, P. Petsagkourakis, M. Mowbray, D. Zhang, and E. A. del Rio-Chanona, “Constrained model-free reinforcement learning for process optimization,” Computers & Chemical Engineering, vol. 154, p. 107462, 2021.