FSM-based Recognition of Dynamic Hand Gestures via Gesture Summarization Using Key Video Object Planes
Authors: M. K. Bhuyan
Abstract:
The use of human hand as a natural interface for humancomputer interaction (HCI) serves as the motivation for research in hand gesture recognition. Vision-based hand gesture recognition involves visual analysis of hand shape, position and/or movement. In this paper, we use the concept of object-based video abstraction for segmenting the frames into video object planes (VOPs), as used in MPEG-4, with each VOP corresponding to one semantically meaningful hand position. Next, the key VOPs are selected on the basis of the amount of change in hand shape – for a given key frame in the sequence the next key frame is the one in which the hand changes its shape significantly. Thus, an entire video clip is transformed into a small number of representative frames that are sufficient to represent a gesture sequence. Subsequently, we model a particular gesture as a sequence of key frames each bearing information about its duration. These constitute a finite state machine. For recognition, the states of the incoming gesture sequence are matched with the states of all different FSMs contained in the database of gesture vocabulary. The core idea of our proposed representation is that redundant frames of the gesture video sequence bear only the temporal information of a gesture and hence discarded for computational efficiency. Experimental results obtained demonstrate the effectiveness of our proposed scheme for key frame extraction, subsequent gesture summarization and finally gesture recognition.
Keywords: Hand gesture, MPEG-4, Hausdorff distance, finite state machine.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1076894
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2027References:
[1] R.C. Rose, Discriminant Wordspotting Techniques for Rejection Non- Vocabulary Utterances in Unconstrained Speech, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, San Francisco, vol.II (1992) 105-108.
[2] K. Takahashi, S. Seki, R. Oka, Spotting Recognition of Human Gestures from Motion Images, Technical Report IE92-134, The Insttitute of Electronics, Information, and Communication Engineers, Japan, (1992) (in Japanese) 9-16.
[3] T. Baudel, M. Beaudouin-Lafon, CHARADE: Remote Control of Objects Using Free-Hand Gestures, Communication ACM, 36 (7) (1993) 28-35.
[4] S.S. Fels, G.E. Hinton, Glove-Talk: A Neural Network Interface between a Data-Glove and a Speech Synthesizer, IEEE Transaction on Neural Networks, 4(1) (1993) 2-8.
[5] S.S. Fels, G.E. Hinton, Glove-Talk II: A Neural Network Interface which Maps Gestures to Parallel Format Speech Synthesizer Controls, IEEE Transaction on Neural Networks, 9(1) (1997) 205-212.
[6] D.L. Quam, Gesture Recognition With a Data Glove, Proceedings of the IEEE Conference on National Aerospace and Electronics, vol. 2 (1990).
[7] D.J. Sturman, D. Zeltzer, A Survey of Glove-Based Input, IEEE Computer Graphics and Applications, vol. 14 (1994) 30-39.
[8] W.W. Kong, S. Ranganath, 3-D Hand Trajectory Recognition for Signing Exact English, Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition, (2004) 535-540.
[9] Ng.Y.Y. Kevin, S. Ranganath, D. Ghosh, Trajectory Modeling in Gesture Recognition using Cybergloves and Magnetic trackers, Proceedings of the IEEE TENCON 2004, Chiang Mai, Thailand, vol.A (2004) A.571- A.574.
[10] J. Davis, M. Shah, Recognizing Hand Gestures, Proceedings of the European Conference on Computer Vision, ECCV, (1994) 331-340.
[11] V.I. Pavlovic, R. Sharma, T.S. Huang, Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review, IEEE Transaction on Pattern Analysis and Machine Intelligence, 19 (7) (1997) 677-695.
[12] Y. Wu, T.S. Huang, Self-supervised Learning for Visual Tracking and Recognition of Human Hand, Proceedings of 17th National Conference on Artificial Intelligence, (AAAI-2000), (2000) 243-248.
[13] J. Zieren, N. Unger, S. Akyol, Hands Tracking from Frontal View for Vision-Based Gesture Recognition, Proceedings of DAGM Symposium, (2002) 531-539.
[14] Y. Wu, T.S. Huang, Hand Modelling, Analysis, and Recognition for Vision-Based Human Computer Interaction, IEEE Signal Processing Magazine, (2001) 51-60.
[15] J. Yamato, J. Ohya, K. Ishii, Recognizing Human Action in Time- Sequential Images Using Hidden Markov Model, Proceeding of the IEEE Conference on Computer Vision and Pattern recognition, (1992) 379-385.
[16] C. Vogler, D. Metaxas, ASL Recognition Based on a Coupling Between HMM and 3D Motion Analysis, Proceedings of the International Conference on Computer Vision, (1998) 363-369.
[17] A. Ramamoorthy, N. Vaswani, S. Chaudhury, S. Banerjee, Recognition of Dynamic Hand Gestures, Pattern Recognition, 36 (2003) 2069-2081.
[18] M. Isard, A. Blake, Contour Tracking by Stochastic Propagation of Conditional density, Proceedings of the European Conference on Computer Vision, (1996) 343-356.
[19] M. Black, A Jepson, Recognition Temporal Trajectories using Condensation Algorithm, Proceedings of the International Conference on Automatic Face and Gesture Recognition, Japan, (1998) 16-21.
[20] J. Davis, M. Shah, Visual Gesture Recognition, Vision, Image and Signal Processing, 141 (2) (1994) 101-106.
[21] A.F. Bobick, A.D. Wilson, A State-Based Approach to the Representation and Recognition of Gesture, IEEE Transaction on Pattern Analysis and Machine Intelligence, 19 (12) (1997) 1325-1337.
[22] P. Hong, M. Turk, T.S. Huang, Gesture Modelling and Recognition using Finite State Machines, Proceeding of the IEEE Conference on Face and Gesture Recognition, (2000) 410-415.
[23] T. Meier, K.N. Nagan, Automatic Segmentation of Moving Objects for Video Object Plane Generation, IEEE Transaction on Circuits and Systems for Video Technology, 8 (5) (1998) 525-538.
[24] D.P. Huttenlocher, G.A. Klanderman, W.J. Rucklidge, Comparing Images using the Hausdorff Distance, IEEE Transaction on Pattern Analysis and Machine Intelligence, 15 (9) (1993) 850-863.
[25] M.K. Bhuyan, D. Ghosh, P.K. Bora, Finite State Representation of Hand Gestures using Key Video Object Plane, Proceedings of the IEEE TENCON 2004, Chiang Mai, Thailand, vol.A (2004) A.579-A.582.
[26] M.K. Bhuyan, D. Ghosh, P.K. Bora, Key Video Object Plane Selection by MPEG-7 Visual Shape Descriptor for Summarization and Recognition of Hand Gestures, Proceedings of the 4th Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP 2004), Kolkata, India (2004) 638-643.
[27] B.S. Manjunath, P. Salembier, T. Sikora, ed., Intoduction to MPEG-7, Multimedia Content Description Interface, John Wiley and Sons, Ltd, (2002).
[28] Y. Cui, J.J. Weng, Hand Segmentation using Learning-Based Prediction and Verification for Hand Sign Recognition, Proceedings of the IEEE CS Conference on Computer Vision and Pattern Recognition, (1997) 88-93.
[29] R. Polana, R. Nelson, Low Level Recognition of Human Motion, Proceeding of the IEEE CS Workshop on Motion of Non-Rigid and Articulated Objects, Austin, (1994) 77-82.
[30] A.F. Bobick, J. Davis, Real-Time Recognition of Activity using Temporal Templates, Proceedings of the IEEE CS Workshop on Applications of Computer Vision, (1996) 39-42.
[31] C. Kim, J.N. Hwang, Object-Based Video Abstraction for Video Surveillance Systems, IEEE Transaction on Circuits and Systems for Video Technology, 12 (12) (2002) 1128-1138.
[32] B. Erol, F. Kossentini, Automatic Key Video Object Plane Selection using the Shape Information in the MPEG-4 Compressed Domain, IEEE Transaction on Multimedia, 2 (2) (2000) 129-138.
[33] G. Borgefors, Distance Transformations in Digital Images, Computer Vision, Graphics and Image Processing, 34 (1986) 344-371.
[34] B. Erol, F. Kossentini, Local Motion Descriptors, Proceedings of the IEEE 4th Workshop on Multimedia Signal Processing, (2001) 467-472.
[35] M. Bober, MPEG-7 Visual Shape Descriptors, IEEE Transaction on Circuits and Systems for Video Technology, 11 (6) (2001) 716-719.