RV-YOLOX: Object Detection on Inland Waterways Based on Optimized YOLOX through Fusion of Vision and 3+1D Millimeter Wave Radar
Authors: Zixian Zhang, Shanliang Yao, Zile Huang, Zhaodong Wu, Xiaohui Zhu, Yong Yue, Jieming Ma
Abstract:
Unmanned Surface Vehicles (USVs) hold significant value for their capacity to undertake hazardous and labor-intensive operations over aquatic environments. Object detection tasks are significant in these applications. Nonetheless, the efficacy of USVs in object detection is impeded by several intrinsic challenges, including the intricate dispersal of obstacles, reflections emanating from coastal structures, and the presence of fog over water surfaces, among others. To address these problems, this paper provides a fusion method for USVs to effectively detect objects in the inland surface environment, utilizing vision sensors and 3+1D Millimeter-wave radar. The MMW radar is a complementary tool to vision sensors, offering reliable environmental data. This approach involves the conversion of the radar’s 3D point cloud into a 2D radar pseudo-image, thereby standardizing the format for radar and vision data by leveraging a point transformer. Furthermore, this paper proposes the development of a multi-source object detection network, named RV-YOLOX, which leverages radar-vision integration specifically tailored for inland waterway environments. The performance is evaluated on our self-recording waterways dataset. Compared with the YOLOX network, our fusion network significantly improves detection accuracy, especially for objects with bad light conditions.
Keywords: Inland waterways, object detection, YOLO, sensor fusion, self-attention, deep learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 292References:
[1] J. Curcio, J. Leonard, and A. Patrikalakis, “Scout - a low cost autonomous surface platform for research in cooperative autonomy,” in Proceedings of OCEANS 2005 MTS/IEEE, 2005, pp. 725–729 Vol. 1.
[2] G. Ferri, A. Manzi, F. Fornai, B. Mazzolai, C. Laschi, F. Ciuchi, and P. Dario, “Design, fabrication and first sea trials of a small-sized autonomous catamaran for heavy metals monitoring in coastal waters,” in 2011 IEEE International Conference on Robotics and Automation. IEEE, 2011, pp. 2406–2411.
[3] D. Madeo, A. Pozzebon, C. Mocenni, and D. Bertoni, “A low-cost unmanned surface vehicle for pervasive water quality monitoring,” IEEE Transactions on Instrumentation and Measurement, vol. 69, no. 4, pp. 1433–1444, 2020.
[4] A. Akib, F. Tasnim, D. Biswas, M. B. Hashem, K. Rahman, A. Bhattacharjee, and S. A. Fattah, “Unmanned floating waste collecting robot,” in TENCON 2019-2019 IEEE Region 10 Conference (TENCON). IEEE, 2019, pp. 2645–2650.
[5] N. Ruangpayoongsak, J. Sumroengrit, and M. Leanglum, “A floating waste scooper robot on water surface,” in 2017 17th International Conference on Control, Automation and Systems (ICCAS). IEEE, 2017, pp. 1543–1548.
[6] W. Hammedi, M. Ramirez-Martinez, P. Brunet, S.-M. Senouci, and M. A. Messous, “Deep learning-based real-time object detection in inland navigation,” in 2019 IEEE Global Communications Conference (GLOBECOM). IEEE, 2019, pp. 1–6.
[7] Y. Yang, P. Chen, K. Ding, Z. Chen, and K. Hu, “Object detection of inland waterway ships based on improved ssd model,” Ships and Offshore Structures, pp. 1–9, 2022.
[8] Y. Cheng, H. Xu, and Y. Liu, “Robust small object detection on the water surface through fusion of camera and millimeter wave radar,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 263–15 272.
[9] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587.
[10] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448.
[11] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
[12] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European conference on computer vision. Springer, 2016, pp. 21–37.
[13] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7263–7271.
[14] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, vol. 28, 2015.
[15] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
[16] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
[17] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “Scaled-yolov4: Scaling cross stage partial network,” in Proceedings of the IEEE/cvf conference on computer vision and pattern recognition, 2021, pp. 13 029–13 038.
[18] G. Jocher, “YOLOv5 by Ultralytics,” 5 2020.
[Online]. Available: https://github.com/ultralytics/yolov5
[19] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “Yolox: Exceeding yolo series in 2021,” arXiv preprint arXiv:2107.08430, 2021.
[20] B. Steux, C. Laurgeau, L. Salesse, and D. Wautier, “Fade: A vehicle detection and tracking system featuring monocular color vision and radar data fusion,” in Intelligent Vehicle Symposium, 2002. IEEE, vol. 2. IEEE, 2002, pp. 632–639.
[21] L. Bombini, P. Cerri, P. Medici, and G. Alessandretti, “Radar-vision fusion for vehicle detection,” in Proceedings of International Workshop on Intelligent Transportation, vol. 65, 2006, p. 70.
[22] S. Chadwick, W. Maddern, and P. Newman, “Distant vehicle detection using radar and vision,” in 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 8311–8317.
[23] F. Nobis, M. Geisslinger, M. Weber, J. Betz, and M. Lienkamp, “A deep learning-based radar and camera sensor fusion architecture for object detection,” in 2019 Sensor Data Fusion: Trends, Solutions, Applications (SDF). IEEE, 2019, pp. 1–7.
[24] S. Chang, Y. Zhang, F. Zhang, X. Zhao, S. Huang, Z. Feng, and Z. Wei, “Spatial attention fusion for obstacle detection using mmwave radar and vision sensor,” Sensors, vol. 20, no. 4, p. 956, 2020.
[25] L.-q. Li and Y.-l. Xie, “A feature pyramid fusion detection algorithm based on radar and camera sensor,” in 2020 15th IEEE International Conference on Signal Processing (ICSP), vol. 1. IEEE, 2020, pp. 366–370.
[26] V. John and S. Mita, “Rvnet: Deep sensor fusion of monocular camera and radar for image-based obstacle detection in challenging environments,” in Pacific-Rim Symposium on Image and Video Technology. Springer, 2019, pp. 351–364.
[27] K. Kowol, M. Rottmann, S. Bracke, and H. Gottschalk, “Yodar: uncertainty-based sensor fusion for vehicle detection with camera and radar sensors,” arXiv preprint arXiv:2010.03320, 2020.
[28] Y. Song, Z. Xie, X. Wang, and Y. Zou, “Ms-yolo: Object detection based on yolov5 optimized fusion millimeter-wave radar and machine vision,” IEEE Sensors Journal, vol. 22, no. 15, pp. 15 435–15 447, 2022.
[29] H. Zhao, L. Jiang, J. Jia, P. H. Torr, and V. Koltun, “Point transformer,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 16 259–16 268.
[30] S. Yao, R. Guan, Z. Wu, Y. Ni, Z. Zhang, Z. Huang, X. Zhu, Y. Yue, Y. Yue, H. Seo et al., “Waterscenes: A multi-task 4d radar-camera fusion dataset and benchmark for autonomous driving on water surfaces,” arXiv preprint arXiv:2307.06505, 2023.