Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 31097
Deep Learning Application for Object Image Recognition and Robot Automatic Grasping

Authors: Shiuh-Jer Huang, Chen-Zon Yan, C. K. Huang, Chun-Chien Ting


Since the vision system application in industrial environment for autonomous purposes is required intensely, the image recognition technique becomes an important research topic. Here, deep learning algorithm is employed in image system to recognize the industrial object and integrate with a 7A6 Series Manipulator for object automatic gripping task. PC and Graphic Processing Unit (GPU) are chosen to construct the 3D Vision Recognition System. Depth Camera (Intel RealSense SR300) is employed to extract the image for object recognition and coordinate derivation. The YOLOv2 scheme is adopted in Convolution neural network (CNN) structure for object classification and center point prediction. Additionally, image processing strategy is used to find the object contour for calculating the object orientation angle. Then, the specified object location and orientation information are sent to robotic controller. Finally, a six-axis manipulator can grasp the specific object in a random environment based on the user command and the extracted image information. The experimental results show that YOLOv2 has been successfully employed to detect the object location and category with confidence near 0.9 and 3D position error less than 0.4 mm. It is useful for future intelligent robotic application in industrial 4.0 environment.

Keywords: Image Processing, Deep learning, convolution neural network, YOLOv2

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 265


[1] Deen Cockbum, Jean-Philippe Roberge, Thuy-Hong-Loan Le, Alexis Maslyczyk and Vincent Duchaine, “Grasp stability assessment through unsupervised feature learning of tactile images,” IEEE International Conference on Robotics and Automation (ICRA), May 29 ~ June 3, pp. 2238-2244, Singapore, 2017.
[2] Jaehyun Yoo and Karl H. Johansson, “Semi-supervised learning for mobile robot localization using wireless signal strengths,” International 12Conference on Indoor Positioning and Indoor Navigation (IPIN), Sapporo, Japan, Sept 18-21, 2017.
[3] S. K. Lenka and A. G. Mohapatra, “Gradient Descent with momentum based neural network pattern classification for the prediction of soil moisture content in precision agriculture,” IEEE International Symposium on Nanoelectronic and Information Systems, Indore, India, October 21-23, 2015, pp. 63-66.
[4] D. Soudry, D. Di Castro, A. Gal, A. Kolodny and S. Kvatinsky, “Memristor-based multilayer neureal network with online gradient descent training,” IEEE Transactions on Neural Networks and Learning Systems, 26 (10), pp. 2408-2421, 2015.
[5] Andy Zeng, Kuan-Ting Yu, Shuran Song, Daniel Suo, Ed Walker, Alberto Rodriguez and Jianxiong Xiao, “Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge,” IEEE International Conference on Robotics and Automation (ICRA), May 29 ~ June 3, Singapore, pp. 1386-1393, 2017.
[6] G. E. Pazienza, P. Giangrossi, S. Tortella, M. Balsi and X. Vilasis-Cardona, “Tracking for a CNN guided robot,” Proceedings of the 2005 European Conference on Circuit Theory and Design, 2005.
[7] E. Martinson and V. Yalla, “Real-time human detection for robots using CNN with a feature-based layered pre-filter,” 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), New York, USA,pp. 1120-1125, 2016.
[8] X. Peng, B. Sun, K. Ali and K. Saenko, “Learning deep object detectors from 3D models,” IEEE International Conference on Computer Vision (ICCV), Santiago, pp. 1278-1286, 2015.
[9] S. Ren, K. He, R. Girshick and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal mnetworks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 39 (6), pp.1137-1149, 2017.
[10] E. Shelhamer, J. Long and T. Darrell, “Fully convolutional networks for semantic segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 39 (4), pp. 640-651, 2017.
[11] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, “You only look once: Unified, real-time object detection,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779-788, 2016.
[12] Intel® RealSense™ Technology, “Intel® RealSense™ SDK”, Revised Jun 2016.
[13] Ning Qian, “On the momentum term in gradient descent learning algorithms,” Neural networks, 12(1), pp. 145–151, 1999.
[14] Sergey Ioffe and Christian Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” arXiv preprint arXiv:1502.03167v3, 2015.
[15] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 6517-6525, 2017.
[16] Li, Zhizhong, and Derek Hoiem, “Learning without forgetting,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12), pp. 2935-2947, 2017.
[17] S. Suzuki and K. Abe, “Topological structural analysis of Digitized binaryimages by border following,” Computer vision, graphics, and image processing 30, pp. 32-46, 1985.
[18] J. Redmon. Darknet: Open source neural networks in c., 2013–2016
[19] Mark Everingham, Luc Gool, Christopher K. Williams, John Winn and Andrew Zisserman, “The Pascal visual object classes (VOC) Challenge,” Int. J. Comput. Vision 88(2), pp. 303-308, 2010.