Hands-off Parking: Deep Learning Gesture-Based System for Individuals with Mobility Needs

Javier Romera; Alberto Justo; Ignacio Fidalgo; Javier Araluce; Joshué Pérez

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33123

Hands-off Parking: Deep Learning Gesture-Based System for Individuals with Mobility Needs

Authors: Javier Romera, Alberto Justo, Ignacio Fidalgo, Javier Araluce, Joshué Pérez

Abstract:

Nowadays, individuals with mobility needs face a significant challenge when docking vehicles. In many cases, after parking, they encounter insufficient space to exit, leading to two undesired outcomes: either avoiding parking in that spot or settling for improperly placed vehicles. To address this issue, this paper presents a parking control system employing gestural teleoperation. The system comprises three main phases: capturing body markers, interpreting gestures, and transmitting orders to the vehicle. The initial phase is centered around the MediaPipe framework, a versatile tool optimized for real-time gesture recognition. MediaPipe excels at detecting and tracing body markers, with a special emphasis on hand gestures. Hands detection is done by generating 21 reference points for each hand. Subsequently, after data capture, the project employs the MultiPerceptron Layer (MPL) for in-depth gesture classification. This tandem of MediaPipe’s extraction prowess and MPL’s analytical capability ensures that human gestures are translated into actionable commands with high precision. Furthermore, the system has been trained and validated within a built-in dataset. To prove the domain adaptation, a framework based on the Robot Operating System 2 (ROS2), as a communication backbone, alongside CARLA Simulator, is used. Following successful simulations, the system is transitioned to a real-world platform, marking a significant milestone in the project. This real-vehicle implementation verifies the practicality and efficiency of the system beyond theoretical constructs.

Keywords: Gesture detection, MediaPipe, MultiLayer Perceptron Layer, Robot Operating System.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.11398343

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 153

References:

[1] B. Li and Z. Shao, “A unified motion planning method for parking an autonomous vehicle in the presence of irregularly placed obstacles,” Knowledge-Based Systems, vol. 86, pp. 11–20, 2015.
[2] L. Hakamies-Blomqvist and B. Peters, “Recent european research on older drivers,” Accident Analysis Prevention, vol. 32, no. 4, pp. 601–607, 2000.
[3] L. Hakamies-Blomqvist, A. Sir´en, and R. Davidse, “Older drivers-a review,” 2004.
[4] B. Li, T. Acarman, Y. Zhang, Y. Ouyang, C. Yaman, Q. Kong, X. Zhong, and X. Peng, “Optimization-based trajectory planning for autonomous parking with irregularly placed obstacles: A lightweight iterative framework,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 8, pp. 11970–11981, 2021.
[5] Q. Zhang, H. Dong, and A. El Saddik, “Magnetic field control for haptic display: System design and simulation,” Ieee Access, vol. 4, pp. 299–311, 2016.
[6] R. Lattarulo, J. P´erez, and J. Murgoitio, “Rrt trajectory planning approach for automated semi-trailer truck parking,” in 2022 IEEE International Conference on Vehicular Electronics and Safety (ICVES), pp. 1–7, 2022.
[7] A. Rass˜olkin, T. Vaimann, A. Kallaste, and V. Kuts, “Digital twin for propulsion drive of autonomous electric vehicle,” in 2019 IEEE 60th International Scientific Conference on Power and Electrical Engineering of Riga Technical University (RTUCON), pp. 1–4, IEEE, 2019.
[8] H. Chen, F. Liu, Y. Yang, and W. Meng, “Multivr: Digital twin and virtual reality based system for multi-people remote control unmanned aerial vehicles,” in 2022 17th International Conference on Control, Automation, Robotics and Vision (ICARCV), pp. 647–652, IEEE, 2022.
[9] V. T. Corporation, “External steering interface exster,” 2018. https://stpi.it.volvo.com/STPIFiles/Volvo/FactSheet/EXSTER Eng 01 309626349.pdf
[Accessed: October 23rd, 2023].
[10] P. Shah, R. Shah, M. Shah, and K. Bhowmick, “Comparative analysis of hand gesture recognition techniques: A review,” in Advanced Computing Technologies and Applications (H. Vasudevan, A. Michalas, N. Shekokar, and M. Narvekar, eds.), (Singapore), pp. 471–478, Springer Singapore, 2020.
[11] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” 2017.
[12] D. Osokin, “Real-time 2d multi-person pose estimation on cpu: Lightweight openpose,” 2018.
[13] F. Zhang, V. Bazarevsky, A. Vakunov, A. Tkachenka, G. Sung, C.-L. Chang, and M. Grundmann, “Mediapipe hands: On-device real-time hand tracking,” arXiv preprint arXiv:2006.10214, 2020.
[14] Y. Hammadi, F. Grondin, F. Ferland, and K. Lebel, “Evaluation of various state of the art head pose estimation algorithms for clinical scenarios,” Sensors, vol. 22, no. 18, 2022.
[15] T. Baltrusaitis, A. Zadeh, Y. C. Lim, and L.-P. Morency, “Openface 2.0: Facial behavior analysis toolkit,” in 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), pp. 59–66, IEEE, 2018.
[16] J. Guo, X. Zhu, Y. Yang, F. Yang, Z. Lei, and S. Z. Li, “Towards fast, accurate and stable 3d dense face alignment,” in Proceedings of the European Conference on Computer Vision (ECCV), 2020.
[17] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015.
[18] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2015.
[19] J. Terven and D. Cordova-Esparza, “A comprehensive review of yolo: From yolov1 and beyond,” 2023.
[20] E. Romera, J. M. Alvarez, L. M. Bergasa, and R. Arroyo, “Erfnet: Efficient residual factorized convnet for real-time semantic segmentation,” IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 1, pp. 263–272, 2017.
[21] C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, C. Chang, M. G. Yong, J. Lee, W. Chang, W. Hua, M. Georg, and M. Grundmann, “Mediapipe: A framework for building perception pipelines,” CoRR, vol. abs/1906.08172, 2019.
[22] C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, C.-L. Chang, M. G. Yong, J. Lee, W.-T. Chang, W. Hua, M. Georg, and M. Grundmann, “Mediapipe: A framework for building perception pipelines,” 2019.
[23] S. Hussain, R. Saxena, X. Han, J. A. Khan, and H. Shin, “Hand gesture recognition using deep learning,” in 2017 International SoC design conference (ISOCC), pp. 48–49, IEEE, 2017.
[24] H. Taud and J. Mas, “Multilayer perceptron (mlp),” Geomatic approaches for modeling land change scenarios, pp. 451–455, 2018.
[25] H. Taud and J. Mas, Multilayer Perceptron (MLP), pp. 451–455. Cham: Springer International Publishing, 2018.
[26] S. Macenski, T. Foote, B. Gerkey, C. Lalancette, andW.Woodall, “Robot operating system 2: Design, architecture, and uses in the wild,” Science Robotics, vol. 7, may 2022.
[27] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2017.
[28] A. S. Foundation, “Multi hot sparse categorical cross entropy.”
[29] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “Carla: An open urban driving simulator,” in Conference on robot learning, pp. 1–16, PMLR, 2017.