Effective Stacking of Deep Neural Models for Automated Object Recognition in Retail Stores

Ankit Sinha; Soham Banerjee; Pratik Chattopadhyay

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33122

Effective Stacking of Deep Neural Models for Automated Object Recognition in Retail Stores

Authors: Ankit Sinha, Soham Banerjee, Pratik Chattopadhyay

Abstract:

Automated product recognition in retail stores is an important real-world application in the domain of Computer Vision and Pattern Recognition. In this paper, we consider the problem of automatically identifying the classes of the products placed on racks in retail stores from an image of the rack and information about the query/product images. We improve upon the existing approaches in terms of effectiveness and memory requirement by developing a two-stage object detection and recognition pipeline comprising of a Faster-RCNN-based object localizer that detects the object regions in the rack image and a ResNet-18-based image encoder that classifies the detected regions into the appropriate classes. Each of the models is fine-tuned using appropriate data sets for better prediction and data augmentation is performed on each query image to prepare an extensive gallery set for fine-tuning the ResNet-18-based product recognition model. This encoder is trained using a triplet loss function following the strategy of online-hard-negative-mining for improved prediction. The proposed models are lightweight and can be connected in an end-to-end manner during deployment to automatically identify each product object placed in a rack image. Extensive experiments using Grozi-32k and GP-180 data sets verify the effectiveness of the proposed model.

Keywords: Retail stores, Faster-RCNN, object localization, ResNet-18, triplet loss, data augmentation, product recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 592

References:

[1] Yuchen Wei, Son N. Tran, Shuxiang Xu, Byeong Ho Kang, and Matthew Springer. Deep learning for retail product recognition: Challenges and techniques. Computational Intelligence and Neuroscience, 2020, Article ID: 8875910, 2020.
[2] Bikash Santra and Dipti Prasad Mukherjee. A comprehensive survey on computer vision based approaches for automatic identification of products in retail store. Image and Vision Computing, 86:45–63, 2019.
[3] Alessio Tonioni, Eugenio Serra, and Luigi di Stefano. A deep learning pipeline for product recognition on store shelves. In Proceedings of the International Conference on Image Processing, Applications and Systems, pages 25–31, 2018.
[4] Bikash Santra, Avishek Shaw, and Dipti Prasad Mukherjee. An end-to-end annotation-free machine vision system for detection of products on the rack. Machine Vision and Applications, 32(3):1–13, 2021.
[5] Bikash Santra, Avishek Shaw, and Dipti Prasad Mukherjee. Graph-based non-maximal suppression for detecting products on the rack. Pattern Recognition Letters, 140:73–80, 2020.
[6] Bikash Santra, Avishek Shaw, and Dipti Prasad Mukherjee. Part-based annotation-free fine-grained classification of images of retail products. Pattern Recognition, 121:108257, 2022.
[7] Marian George and Christian Floerkemeier. Recognizing products: A per-exemplar multi-label image classification approach. In Proceedings of the European Conference on Computer Vision, pages 440–455, 2014.
[8] Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, Thomas S. Huang, and Yihong Gong. Locality-constrained linear coding for image classification. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 3360–3367, 2010.
[9] Wenyon Wang, Yongcheng Cui, Guangshun Li, Chuntao Jiang, and Song Deng. A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition. Neural Computing and Applications, 32(18):1–10, 2020.
[10] Anton Osokin, Denis Sumin, and Vasily Lomakin. Os2d: One-stage one-shot object detection by matching anchor features. In Proceedings of the European Conference on Computer Vision, pages 635–652, 2020.
[11] Anurag Saran, Ehtesham Hassan, and Avinash Kumar Maurya. Robust visual analysis for planogram compliance problem. In Proceedings of the IAPR International Conference on Machine Vision Applications, pages 576–579. IEEE, 2015.
[12] Archan Ray, Nishant Kumar, Avishek Shaw, and Dipti Prasad Mukherjee. U-pc: Unsupervised planogram compliance. In Proceedings of the European Conference on Computer Vision, pages 586–600, 2018.
[13] Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. Surf: Speeded up robust features. In Proceedings of the European Conference on Computer Vision, 2006.
[14] Song Liu, W. Li, Stephen J. Davis, Christian Ritz, and Hongda Tian. Planogram compliance checking based on detection of recurring patterns. IEEE MultiMedia, 23(2):54–63, 2016.
[15] Alessio Tonioni and Luigi di Stefano. Product recognition in store shelves as a sub-graph isomorphism problem. In Proceedings of the International Conference on Image Analysis and Processing, pages 682–693, 2017.
[16] Eran Goldman and Jacob Goldberger. Large-scale classification of structured objects using a crf with deep class embedding. arXiv preprint arXiv:1705.07420, 2017.
[17] Ipek Baz, Erdem Y¨or¨uk, and M¨ujdat C¸ etin. Context-aware hybrid classification system for fine-grained retail product recognition. Proceedings of the Image, Video, and Multidimensional Signal Processing Workshop, pages 1–5, 2016.
[18] Wei dong Geng, Feilin Han, Jiangke Lin, Liuyi Zhu, Jieming Bai, Suzhen Wang, Lin He, Qiang Xiao, and Zhangjiong Lai. Fine-grained grocery product recognition by one-shot learning. Proceedings of the ACM International Conference on Multimedia, pages 1706–1714, 2018.
[19] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, abs/1409.1556, 2014.
[20] Stefan Leutenegger, Margarita Chli, and Roland Y. Siegwart. Brisk: Binary robust invariant scalable keypoints. Proceedings of the International Conference on Computer Vision, pages 2548–2555, 2011.
[21] Joseph Redmon and Ali Farhadi. Yolo9000: Better, faster, stronger. arXiv preprint arXiv:1612.08242, 2016.
[22] Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the International Conference on Computer Vision and Pattern Recognition, pages 815–823, 2015.
[23] Giorgos Tolias, Ronan Sicre, and Herv´e J´egou. Particular object retrieval with integral max-pooling of cnn activations. arXiv preprint arXiv:1511.05879, 2015.
[24] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, volume 28, 2015.
[25] Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. Proceedings of the International Conference on Computer Vision and Pattern Recognition, pages 779–788, 2016.
[26] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
[27] W. Liu, Dragomir Anguelov, D. Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, pages 21–37, 2016.
[28] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In Proceedings of the European Conference on Computer Vision, pages 630–645. Springer, 2016.
[29] Tsung-Yi Lin, Piotr Doll´ar, Ross B. Girshick, Kaiming He, Bharath Hariharan, and Serge J. Belongie. Feature pyramid networks for object detection. Proceedings of the International Conference on Computer Vision and Pattern Recognition, pages 2117–2125, 2017.
[30] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Li Fei-Fei. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211–252, 2015.
[31] Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C. Lawrence Zitnick. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, 2014.
[32] L´eon Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of the International Conference on Computational Statistics, pages 177–186, 2010.
[33] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[34] Erdem Y¨or¨uk, Kaan Taha Oner, and Ceyhun Burak Akg¨ul. An efficient hough transform for multi-instance object recognition and pose estimation. Proceedings of the International Conference on Pattern Recognition, pages 1352–1357, 2016.