Single-Camera Basketball Tracker through Pose and Semantic Feature Fusion

Adrià Arbués-Sangüesa; Coloma Ballester; Gloria Haro

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32797

Single-Camera Basketball Tracker through Pose and Semantic Feature Fusion

Authors: Adrià Arbués-Sangüesa, Coloma Ballester, Gloria Haro

Abstract:

Tracking sports players is a widely challenging scenario, specially in single-feed videos recorded in tight courts, where cluttering and occlusions cannot be avoided. This paper presents an analysis of several geometric and semantic visual features to detect and track basketball players. An ablation study is carried out and then used to remark that a robust tracker can be built with Deep Learning features, without the need of extracting contextual ones, such as proximity or color similarity, nor applying camera stabilization techniques. The presented tracker consists of: (1) a detection step, which uses a pretrained deep learning model to estimate the players pose, followed by (2) a tracking step, which leverages pose and semantic information from the output of a convolutional layer in a VGG network. Its performance is analyzed in terms of MOTA over a basketball dataset with more than 10k instances.

Keywords: Basketball, deep learning, feature extraction, single-camera, tracking.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.3346759

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 642

References:

[1] K. Bernardin and R. Stiefelhagen, “Evaluating multiple object tracking performance: the clear mot metrics,” Journal on Image and Video Processing, vol. 2008, pp. 1, 2008.
[2] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” in IEEE Conf. on Computer Vision and Pattern Recognition, 2017, pp. 1302–1310.
[3] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “OpenPose GitHub Repository” in https://github.com/CMU-Perceptual-Computing-Lab/ openpose, last accessed May 18th 2019.
[4] J. Deng,and W. Dong, and R. Socher, and L. Li, and K. Li, and L. Fei-Fei, “ImageNet: A Large-Scale Hierarchical Image Database,” CVPR09, 2009
[5] A. Doering, U. Iqbal, and J. Gall, “Joint flow: Temporal flow fields for multi person tracking,” arXiv preprint arXiv:1805.04596, 2018.
[6] R. Girdhar, G. Gkioxari, L. Torresani, M. Paluri, and D. Tran, “Detect-and-track: Efficient pose estimation in videos,” in IEEE Conf. on Computer Vision and Pattern Recognition, 2018, pp. 350–359.
[7] R. Grompone Von Gioi, J. Jakubowicz, J.-M. Morel, and G. Randall, “Lsd: A fast line segment detector with a false detection control,” IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 4, pp. 722–732, 2010.
[8] R. Grompone Von Gioi, J. Jakubowicz, J.-M. Morel, and G. Randall, “Lsd: a line segment detector,” Image Processing On Line, vol. 2, pp. 35–55, 2012.
[9] R. Henschel, L. Leal-Taix´e, D. Cremers, and B. Rosenhahn, “Fusion of head and full-body detectors for multi-object tracking,” in IEEE Conf. on Computer Vision and Pattern Recognition Workshops, 2018, pp. 1509–150909.
[10] E. Insafutdinov, M. Andriluka, L. Pishchulin, S. Tang, E. Levinkov, B. Andres, and B. Schiele, “Arttrack: Articulated multi-person tracking in the wild,” in IEEE Conf. on Computer Vision and Pattern Recognition, 2017, vol. 4327.
[11] U. Iqbal, A. Milan, and J. Gall, “Posetrack: Joint multi-person pose estimation and tracking,” in IEEE Conf. on Computer Vision and Pattern Recognition, 2017, pp. 2011–2020.
[12] A. Milan, L. Leal-Taix´e, K. Schindler, and I. Reid, “Joint tracking and segmentation of multiple targets,” in IEEE Conf. on Computer Vision and Pattern Recognition, 2015, pp. 5397–5406.
[13] Y. Qi, and S. Zhang, and L. Qin, and H. Yao, and Q. Huang, and J. Lim, and M.-H. Yang, “Hedged deep tracking,” Proceedings of the IEEE conference on computer vision and pattern recognition, 2016
[14] V. Ramakrishna, D. Munoz, M. Hebert, J. Andrew Bagnell, and Y. Sheikh, “Pose Machines: Articulated Pose Estimation via Inference Machines,” in IEEE European Conf. Computer Vision, 2014, pp. 33–47.
[15] V. Ramanathan, J. Huang, S. Abu-El-Haija, A. Gorban, K. Murphy, and L. Fei-Fei, “Detecting events and key actors in multi-person videos,” in IEEE Conf. on Computer Vision and Pattern Recognition, 2016, pp. 3043–3053.
[16] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
[17] J. S´anchez, “Comparison of motion smoothing strategies for video stabilization using parametric models,” Image Processing On Line, vol. 7, pp. 309–346, 2017.
[18] A. Senocak, T.-H. Oh, J. Kim, and I. S. Kweon, “Part-based player identification using deep convolutional representation and multi-scale pooling,” in IEEE Conf. on Computer Vision and Pattern Recognition Workshops, 2018, pp. 1732–1739.
[19] K. Simonyan, and A Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014
[20] G. Thomas, R. Gade, T. B. Moeslund, P. Carr, and A. Hilton, “Computer vision for sports: current applications and research topics,” Computer Vision and Image Understanding, vol. 159, pp. 3–18, 2017.
[21] Q. Wang, and J. Gao, and J. Xing, and M. Zhang, and W. Hu, “Dcfnet: Discriminant correlation filters network for visual tracking,” arXiv preprint arXiv:1704.04057, 2017
[22] X. Wang, and A. Jabri, and A. Efros, “Learning Correspondence from the Cycle-Consistency of Time,” arXiv preprint arXiv:1903.07593, 2019
[23] N. Wang, and Y. Song, and C. Ma, and W. Zhou, and W. Liu, and H. Li, “Unsupervised Deep Tracking,” arXiv preprint arXiv:1904.01828, 2019
[24] S. E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, “Convolutional Pose Machines,” in IEEE Conf. on Computer Vision and Pattern Recognition, 2016, pp. 4724–4732.