Robot Exploration and Navigation in Unseen Environments Using Deep Reinforcement Learning
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33156
Robot Exploration and Navigation in Unseen Environments Using Deep Reinforcement Learning

Authors: Romisaa Ali

Abstract:

This paper presents a comparison between twin-delayed Deep Deterministic Policy Gradient (TD3) and Soft Actor-Critic (SAC) reinforcement learning algorithms in the context of training robust navigation policies for Jackal robots. By leveraging an open-source framework and custom motion control environments, the study evaluates the performance, robustness, and transferability of the trained policies across a range of scenarios. The primary focus of the experiments is to assess the training process, the adaptability of the algorithms, and the robot’s ability to navigate in previously unseen environments. Moreover, the paper examines the influence of varying environment complexities on the learning process and the generalization capabilities of the resulting policies. The results of this study aim to inform and guide the development of more efficient and practical reinforcement learning-based navigation policies for Jackal robots in real-world scenarios.

Keywords: Jackal robot environments, reinforcement learning, TD3, SAC, robust navigation, transferability, Custom Environment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 102

References:


[1] J. D. Johnson, J. Li, and Z. Chen, “Reinforcement Learning: An Introduction: R.S. Sutton, A.G. Barto, MIT Press, Cambridge, MA 1998, 322 pp. ISBN 0-262-19398-1,” Neurocomputing, vol. 35, no. 1-4, pp. 205–206, 2000.
[2] S. Nair and C. Finn, “Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation,” in 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 2020. Online. Available: https://openreview.net/ forum?id=H1gzR2VKDH
[3] Z. Xu, B. Liu, X. Xiao, A. Nair, and P. Stone, “Benchmarking Reinforcement Learning Techniques for Autonomous Navigation,” CoRR, vol. abs/2210.04839, 2022. Online. Available: https://doi.org/10. 48550/arXiv.2210.04839
[4] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,” in Proceedings of the 35th International Conference on Machine Learning, ICML 2018, J. G. Dy and A. Krause, Eds., vol. 80, Stockholmsm¨assan, Stockholm, Sweden, 2018, pp. 1856–1865. Online. Available: http://proceedings.mlr.press/v80/haarnoja18b.html
[5] Z. Xu, B. Liu, X. Xiao, A. Nair, and P. Stone, ”Benchmarking Reinforcement Learning Techniques for Autonomous Navigation,” in IEEE International Conference on Robotics and Automation (ICRA), London, UK, 2023, pp. 9224–9230. Online. Available: https://doi.org/ 10.1109/ICRA48891.2023.10160583
[6] A. S. Anand, J. E. Kveen, F. J. Abu-Dakka, E. I. Grøtli, and J. T. Gravdahl, ”Addressing Sample Efficiency and Model-bias in Model-based Reinforcement Learning,” in 21st IEEE International Conference on Machine Learning and Applications (ICMLA), Nassau, Bahamas, 2022, pp. 1–6. Online. Available: https://doi.org/10.1109/ ICMLA55696.2022.00009
[7] N. P. Koenig and A. Howard, ”Design and use paradigms for Gazebo, an open-source multi-robot simulator,” in 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan, 2004, pp. 2149–2154. Online. Available: https://doi.org/10.1109/IROS.2004. 1389727
[8] Y. Chen, C. Rastogi, and W. R. Norris, ”A CNN Based Vision-Proprioception Fusion Method for Robust UGV Terrain Classification,” IEEE Robotics Autom. Lett., vol. 6, no. 4, pp. 7965–7972, 2021. Online. Available: https://doi.org/10.1109/lra.2021.3101866
[9] J. Wu, Q. M. J. Wu, S. Chen, F. Pourpanah, and D. Huang, ”A-TD3: An Adaptive Asynchronous Twin Delayed Deep Deterministic for Continuous Action Spaces,” IEEE Access, vol. 10, pp. 128077–128089, 2022. Online. Available: https://doi.org/10.1109/ACCESS.2022.3226446
[10] Y. Tan, Y. Lin, T. Liu, and H. Min, ”PL-TD3: A Dynamic Path Planning Algorithm of Mobile Robot,” in IEEE International Conference on Systems, Man, and Cybernetics (SMC), Prague, Czech Republic, Oct. 9-12, 2022, pp. 3040–3045. Online. Available: https://doi.org/10.1109/ SMC53654.2022.9945119
[11] K. Nakhleh, M. Raza, M. Tang, M. Andrews, R. Boney, I. Hadzic, J. Lee, A. Mohajeri, and K. Palyutina, ”SACPlanner: Real-World Collision Avoidance with a Soft Actor Critic Local Planner and Polar State Representations,” in IEEE International Conference on Robotics and Automation (ICRA), London, UK, May 29 - June 2, 2023, pp. 9464–9470. Online. Available: https://doi.org/10.1109/ICRA48891. 2023.10161129
[12] J. B. Martin, R. Chekroun, and F. Moutarde, ”Learning from demonstrations with SACR2: Soft Actor-Critic with Reward Relabeling,” CoRR, vol. abs/2110.14464, 2021. Online. Available: https://arxiv.org/ abs/2110.14464
[13] L. Chavali, T. Gupta, and P. Saxena, ”SAC-AP: Soft Actor Critic based Deep Reinforcement Learning for Alert Prioritization,” in IEEE Congress on Evolutionary Computation, CEC 2022, Padua, Italy, July 18-23, 2022, pp. 1–8. Online. Available: https://doi.org/10.1109/ CEC55065.2022.9870423
[14] N. P. Koenig and A. Howard, ”Design and use paradigms for Gazebo, an open-source multi-robot simulator,” in 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan, September 28 - October 2, 2004, pp. 2149–2154. Online. Available: https://doi.org/ 10.1109/IROS.2004.1389727
[15] B. Siciliano and O. Khatib, ”Robotics and the Handbook,” in Springer Handbook of Robotics, B. Siciliano and O. Khatib, Eds., Springer, 2016, pp. 1–10. Online. Available: https://doi.org/10.1007/ 978-3-319-32552-1 1
[16] S. Fujimoto, H. van Hoof, and D. Meger, “Addressing Function Approximation Error in Actor-Critic Methods,” in Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm¨assan, Stockholm, Sweden, July 10-15, 2018, vol. 80, pp. 1582–1591, 2018. Online. Available: http://proceedings.mlr.press/v80/ fujimoto18a.html
[17] Daffan. (n.d.). GitHub - Daffan/ros jackal: ROS-Jackal environment for RL. GitHub. https://github.com/Daffan/ros jackal
[18] Xiao, J. (n.d.). Benchmarking Reinforcement Learning Techniques for Autonomous Navigation. Retrieved from https://cs.gmu.edu/∼xiao/ Research/RLNavBenchmark/
[19] Sylabs, ”Admin Guide: Installation on Linux,” Sylabs Documentation, 2023. Online. Available: https://docs.sylabs.io/guides/latest/ admin-guide/installation.html