A Robust Visual SLAM for Indoor Dynamic Environment

Xiang Zhang; Daohong Yang; Ziyuan Wu; Lei Li; Wanting Zhou

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33156

A Robust Visual SLAM for Indoor Dynamic Environment

Authors: Xiang Zhang, Daohong Yang, Ziyuan Wu, Lei Li, Wanting Zhou

Abstract:

Visual Simultaneous Localization and Mapping (VSLAM) uses cameras to gather information in unknown environments to achieve simultaneous localization and mapping of the environment. This technology has a wide range of applications in autonomous driving, virtual reality, and other related fields. Currently, the research advancements related to VSLAM can maintain high accuracy in static environments. But in dynamic environments, the presence of moving objects in the scene can reduce the stability of the VSLAM system, leading to inaccurate localization and mapping, or even system failure. In this paper, a robust VSLAM method was proposed to effectively address the challenges in dynamic environments. We proposed a dynamic region removal scheme based on a semantic segmentation neural network and geometric constraints. Firstly, a semantic segmentation neural network is used to extract the prior active motion region, prior static region, and prior passive motion region in the environment. Then, the lightweight frame tracking module initializes the transform pose between the previous frame and the current frame on the prior static region. A motion consistency detection module based on multi-view geometry and scene flow is used to divide the environment into static regions and dynamic regions. Thus, the dynamic object region was successfully eliminated. Finally, only the static region is used for tracking thread. Our research is based on the ORBSLAM3 system, which is one of the most effective VSLAM systems available. We evaluated our method on the TUM RGB-D benchmark and the results demonstrate that the proposed VSLAM method improves the accuracy of the original ORBSLAM3 by 70%˜98.5% under a high dynamic environment.

Keywords: Dynamic scene, dynamic visual SLAM, semantic segmentation, scene flow, VSLAM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 198

References:

[1] C. Campos, R. Elvira, J. J. G. Rodr´ıguez, J. M. Montiel, and J. D. Tard´os, “Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam,” IEEE Transactions on Robotics, vol. 37, no. 6, pp. 1874–1890, 2021.
[2] T. Qin, P. Li, and S. Shen, “Vins-mono: A robust and versatile monocular visual-inertial state estimator,” IEEE Transactions on Robotics, vol. 34, no. 4, pp. 1004–1020, 2018.
[3] C. Forster, M. Pizzoli, and D. Scaramuzza, “Svo: Fast semi-direct monocular visual odometry,” in 2014 IEEE International Conference on Robotics and Automation (ICRA), 2014, pp. 15–22.
[4] A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse, “Monoslam: Real-time single camera slam,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 6, pp. 1052–1067, 2007.
[5] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit, “Fastslam: A factored solution to the simultaneous localization and mapping problem,” 11 2002.
[6] R. Mur-Artal, J. M. M. Montiel, and J. D. Tard´os, “Orb-slam: A versatile and accurate monocular slam system,” IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147–1163, 2015.
[7] J. Engel, T. Sch¨ops, and D. Cremers, “Lsd-slam: Large-scale direct monocular slam,” in Computer Vision – ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds. Cham: Springer International Publishing, 2014, pp. 834–849.
[8] M. Labb´e and F. Michaud, “Rtab-map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation,” Journal of Field Robotics, vol. 36, no. 2, pp. 416–446, 2019.
[Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/rob.21831
[9] Y. Sun, M. Liu, and M. Q.-H. Meng, “Motion removal for reliable rgb-d slam in dynamic environments,” Robotics and Autonomous Systems, vol. 108, pp. 115–128, 2018.
[Online]. Available: https://www.sciencedirect.com/science/article/pii/S0921889018300691
[10] S. Li and D. Lee, “Rgb-d slam in dynamic environments using static point weighting,” IEEE Robotics and Automation Letters, vol. 2, no. 4, pp. 2263–2270, 2017.
[11] B. Bescos, J. M. F´acil, J. Civera, and J. Neira, “Dynaslam: Tracking, mapping, and inpainting in dynamic scenes,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 4076–4083, 2018.
[12] G. Singh, M. Wu, and S.-K. Lam, “Fusing semantics and motion state detection for robust visual slam,” in 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 2753–2762.
[13] T. Ran, L. Yuan, J. Zhang, D. Tang, and L. He, “Rs-slam: A robust semantic slam in dynamic environments based on rgb-d sensor,” IEEE Sensors Journal, vol. 21, no. 18, pp. 20 657–20 664, 2021.
[14] C. Yu, Z. Liu, X.-J. Liu, F. Xie, Y. Yang, Q.Wei, and Q. Fei, “Ds-slam: A semantic visual slam towards dynamic environments,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 1168–1174.
[15] Y. Liu and J. Miura, “Rdmo-slam: Real-time visual slam for dynamic environments using semantic label prediction with optical flow,” IEEE Access, vol. 9, pp. 106 981–106 997, 2021.
[16] Liu, Yubao and Miura, Jun, “Rds-slam: Real-time dynamic slam using semantic segmentation methods,” Ieee Access, vol. 9, pp. 23 772–23 785, 2021.
[17] D. Li, X. Shi, Q. Long, S. Liu, W. Yang, F. Wang, Q. Wei, and F. Qiao, “Dxslam: A robust and efficient visual slam system with deep features,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 4958–4965.
[18] W. Wu, L. Guo, H. Gao, Z. You, Y. Liu, and Z. Chen, “Yolo-slam: A semantic slam system towards dynamic environment with geometric constraint,” Neural Computing and Applications, vol. 34, no. 8, pp. 6011–6026, Apr 2022.
[Online]. Available: https://doi.org/10.1007/s00521-021-06764-3
[19] P. Besl and N. D. McKay, “A method for registration of 3-d shapes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 2, pp. 239–256, 1992.
[20] K. S. Arun, T. S. Huang, and S. D. Blostein, “Least-squares fitting of two 3-d point sets,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-9, no. 5, pp. 698–700, 1987.
[21] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” CoRR, vol. abs/1612.01105, 2016.
[Online]. Available: http://arxiv.org/abs/1612.01105
[22] C. Wang, Y. Zhang, and X. Li, “Pmds-slam: Probability mesh enhanced semantic slam in dynamic environments,” in 2020 5th International Conference on Control, Robotics and Cybernetics (CRC), 2020, pp. 40–44.
[23] M. Grupp, “evo: Python package for the evaluation of odometry and slam.” https://github.com/MichaelGrupp/evo, 2017.