Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30850
SIFT Accordion: A Space-Time Descriptor Applied to Human Action Recognition

Authors: Olfa.Ben Ahmed, Mahmoud. Mejdoub, Chokri. Ben Amar


Recognizing human action from videos is an active field of research in computer vision and pattern recognition. Human activity recognition has many potential applications such as video surveillance, human machine interaction, sport videos retrieval and robot navigation. Actually, local descriptors and bag of visuals words models achieve state-of-the-art performance for human action recognition. The main challenge in features description is how to represent efficiently the local motion information. Most of the previous works focus on the extension of 2D local descriptors on 3D ones to describe local information around every interest point. In this paper, we propose a new spatio-temporal descriptor based on a spacetime description of moving points. Our description is focused on an Accordion representation of video which is well-suited to recognize human action from 2D local descriptors without the need to 3D extensions. We use the bag of words approach to represent videos. We quantify 2D local descriptor describing both temporal and spatial features with a good compromise between computational complexity and action recognition rates. We have reached impressive results on publicly available action data set

Keywords: video, Motion, SIFT, Human action, Accordion, Bag of Features, Moving point, Space-Time Descriptor

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1807


[1] D. Weinland, R.Ronfard and E. " A Survey of Vision-Based Methods for Action Representation, Segmentation and Recognition", Computer Vision and Image Understanding 2010
[2] K.Aggarwal and S.Park, "Human motion: Modeling and recognition of actions and interactions", in 3DPVT-04 Washington, DC, USA: IEEE Computer Society, 2004, pp. 640647
[3] T.B.Moeslund, A.Hilton, and V.Kruger, " A survey of advances in vision-based human motion capture and analysis", CVIU 2006, 90-126
[4] A.F. Bobick and J.W. Davis, "The recognition of human movement using temporal templates", IEEE T-PAMI, 257-267, 2001
[5] I. Laptev and T. Lindeberg, "Space-time interest points", In ICCV, 2003
[6] I. Laptev, M. Marsza lek, C. Schmid, and B. Rozenfeld, "Learning realistic human actions from movies", In CVPR, 2008
[7] M. Mejdoub, L. Fonteles, C. BenAmar, and Marc Antonini. "Embedded lattices tree: An Efficient indexing scheme for content based retrieval on image databases", Journal of Visual Communication and mage Representation, Elsevier, 2009.
[8] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, Behavior recognition via sparse spatio-temporal features, In VS-PETS, 2005
[9] D. Lowe,"Distinctive image features from scale-invariant keypoints", IJCV, 91-110,2004
[10] A. Klaser, M. Marsza lek, and C. Schmid, "A spatio-temporal descriptor based on 3Dgradients", In BMVC, 2008
[11] P. Scovanner, S. Ali, and M. Shah, "A 3-dimensional SIFT descriptor and its application to action recognition", In MULTIMEDIA, 2007
[12] A. Klaser, M. Marsza lek, C. Schmid, and A. Zisserman,"Human Focused Action Localization in Video", in International Workshop on Sign, Gesture, Activity 2010
[13] T.Ouni, W.Ayedi and M.Abid, " New low complexity DCT based video compression method", In Proceedings of the 16th International Conference on Telecommunications (ICT-09), 202-207, Piscataway, NJ, USA, 2009, IEEE Press
[14] T.Ouni, W.Ayedi and Mohamed Abid, "New Non Predictive WaveletBased Video Coder: Performances Analysis", In Proceedings of International Conference on Image Analysis and Recognition. Volume 6111 of LNCS, pages 344-353, Berlin, Heidelberg, 2010. Springer- Verlag
[15] T.Ouni, W.Ayedi et M.Abid, "A Complete Non predictive VideoCompression Scheme Based on a 3D to 2D Geometric transform",International Journal Signal and Imaging Systems Engineering (IJSISE), Inderscience Publisher, 2011
[16] J.Wang, H. Lu, L.Duan and J.S. Jin, "Commercial Video Retrieval with Video-based Bag of Words", Fifth International Conference on Intelligent Multimedia Computing and Networking 2007, July.22, 2007. Salt Lake City, Utah, USA
[17] S.Ali, and M.Shah, "Human action recognition in videos using kinematic features and multiple instance learning", in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)28830, 2010
[18] H. Ning, Y. Hu, T. Huang, "Searching human behaviors using spatialtemporal words", in Proceedings of IEEE ICIP 07, 2007, pp. 337340
[19] A. Fathi and G. Mori. Action recognition by learning mid-level motion features, In CVPR, 2008
[20] R. Messing, C. Pal, and H. Kautz, "Activity recognition using the velocity histories of tracked keypoints", In ICCV, 2009
[21] G. Willems, T. Tuytelaars, and L. Van Gool, "An effcient dense and scale-invariant spatio-temporal interest point detector", In ECCV, 2008
[22] A.P.B.Lopes, R.S. Oliveira, J.M. de Almeida, and Albuquerque Araujo, Spatio-temporal frames in a bag-of-visual-features approach for human actions recognition, in SIBGRAPI 09. IEEE Computer Society, 2009
[23] Y. Kawai, M. Takahashi, M. Fujii, M. Naemura, S. Sato, "NHK STRL at TRECVID 2010: Semantic Indexing and Surveillance Event Detection", Proc. TRECVID Workshop, Gaithersburg, MD, USA, November 2010
[24] Y. Benezeth, P.M. Jodoin, B. Emile, H. Laurent, C.Rosenberger,Review and evaluation of commonly-implemented background subtraction algorithms, in Proc. of the International Conference on Pattern Recognition, 2008
[25] C.Stauffer, W. Grimson, "Learning patterns of activity using real-time tracking", in IEEE Transactions on Pattern Analysis and Machine Intelligence 2000, pp. 747757
[26] C.Tomasi and T.Kanade, Detection and tracking of Point Features, Carnegie Mellon University TeChnical Report CMU-CS-91-132, April 1991
[27] J.Y Bouguet, "Pyramidal Implementation of the Lucas Kanade Feature Tracker Description of the algorithm", Intel Corporation, Microprocessor Research Labs,1999
[28] S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos : Primal estimated sub-gradient solver for svm, ICML, pages 807814, 2007
[29] J. Liu, S. Ali, and M. Shah, "Recognizing human actions using multiple features", In CVPR, 2008
[30] J. Niebles and L. Fei-Fei, "A hierarchical model of shape and appearance for human action classiffcation", In CVPR, 2007
[31] J. Niebles, H. Wang, and L. Fei-Fei, "Unsupervised learning of human action categories using spatial-temporal words", IJCV, 299-318, 2008
[32] J. Niebles, H. Wang, and L. Fei-Fei, "Unsupervised learning of human action categories using spatial-temporal words", In BMVC, 2006