Robot Movement Using the Trust Region Policy Optimization
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33122
Robot Movement Using the Trust Region Policy Optimization

Authors: Romisaa Ali

Abstract:

The Policy Gradient approach is a subset of the Deep Reinforcement Learning (DRL) combines Deep Neural Networks (DNN) with Reinforcement Learning (RL). This approach finds the optimal policy of robot movement, based on the experience it gains from interaction with its environment. Unlike previous policy gradient algorithms, which were unable to handle the two types of error variance and bias introduced by the DNN model due to over- or underestimation, this algorithm is capable of handling both types of error variance and bias. This article will discuss the state-of-the-art SOTA policy gradient technique, trust region policy optimization (TRPO), by applying this method in various environments compared to another policy gradient method, the Proximal Policy Optimization (PPO), to explain their robust optimization, using this SOTA to gather experience data during various training phases after observing the impact of hyper-parameters on neural network performance.

Keywords: Deep neural networks, deep reinforcement learning, Proximal Policy Optimization, state-of-the-art, trust region policy optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 195

References:


[1] R. S. Sutton and A. G. Barto, A Reinforcement Learning: Introduction. Mit Press, 2012.
[2] M. Sewak, Deep reinforcement learning: Frontiers of artificial intelligence, 1st ed. Singapore, Singapore: Springer, 2020.
[3] Ott Toomet, “Stochastic Gradient Ascent in maxLik,” 2020.
[4] Jorge Nocedal, Stephen J. Wright, Sequential Quadratic Programming, Springer, New York, NY, 1999.
[5] Iris Smit, Reinforcement Learning, and surrogate reward functions based on graph Laplacians, Utrecht University, 2022
[6] J. Hui, “RL — The Math behind TRPO & PPO,” Medium, Sep. 14, 2018. https://jonathan-hui.medium.com/rl-the-math-behind-trpo-ppo-d12f6c745f33
[7] C. Canuto and A. Tabacco, Mathematical analysis II, 2nd ed. Basel, Switzerland: Springer International Publishing, 2015
[8] “PyTorch Lightning,” Pytorchlightning.ai. (Online). Available: https://www.pytorchlightning.ai/. (Accessed: 03-Oct-2022).
[9] “Google colab,” Google.com. (Online). Available: https://research.google.com/colaboratory/faq.html. (Accessed: 03-Oct2022).
[10] Machinelearningmastery.com. (Online). Available: https://machinelearningmastery.com/difference-between-a-batch-and-anepoch/. (Accessed: 03-Oct-2022).
[11] Sehgal A, La H, Louis S, Nguyen H, editors. Deep reinforcement learning using genetic algorithm for parameter optimization. 2019 Third IEEE International Conference on Robotic Computing (IRC); 2019: IEEE.
[12] A. Mohapatra, “Trust region methods for deep reinforcement learning,” Analytics Vidhya, 04-Jul-2021. (Online). Available: https://medium.com/analytics-vidhya/trust-region-methods-for-deep-reinforcement-learning-e7e2a8460284. (Accessed: 07-Oct-2022).
[13] Lange, K. (2016). MM optimization algorithms. SIAM. This book offers a comprehensive introduction to MM optimization algorithms and their applications.