Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 31340
Comparing Emotion Recognition from Voice and Facial Data Using Time Invariant Features

Authors: Vesna Kirandziska, Nevena Ackovska, Ana Madevska Bogdanova


The problem of emotion recognition is a challenging problem. It is still an open problem from the aspect of both intelligent systems and psychology. In this paper, both voice features and facial features are used for building an emotion recognition system. A Support Vector Machine classifiers are built by using raw data from video recordings. In this paper, the results obtained for the emotion recognition are given, and a discussion about the validity and the expressiveness of different emotions is presented. A comparison between the classifiers build from facial data only, voice data only and from the combination of both data is made here. The need for a better combination of the information from facial expression and voice data is argued.

Keywords: Emotion recognition, facial recognition, signal processing, machine learning.

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1556


[1] H. Gunes, B. Schuller, M. Pantic, and R. Cowie, “Emotion representation, analysis and synthesis in continuous space: A survey”, Automatic Face & Gesture Recognition and Workshops, Santa Barbara, 2011, pp. 872-834.
[2] R. Plutchik, “The nature of Emotions”. American Scientist, vo. 89, July-August, 2001, pp. 344-350.
[3] P. Ekman, W.V. Friesen and P. Ellsworth, “Emotion in the human face: Guidelines for research and an integration of findings”, New York: Pergamon Press, 1972.
[4] A. Metallinou, S. Lee and N. Sarayanan, “Audio-Visual Emotion Recognition Using Gaussian Mixture Models for Face and Voice”, Tenth IEEE International Symposium on Multimedia, Berkeley, CA, 2008, pp. 250-257.
[5] D. Chen, D. Jiang, I. Ravyse and H. Sahli, “Audio-Visual Emotion Recognition Based on a DBN Model with Constrained Asynchrony”, Fifth International Conference on Image and Graphics, 2009, pp. 912-916.
[6] V. Kirandziska and N. Ackovska, “Human-robot interaction based on human emotions extracted from speech”, In Proc. Of the TELFOR, Belgrade, Serbia, 2012, pp. 1381-1384.
[7] V. Kirandziska and N. Ackovska, “Effects and usage of emotion aware robots that perceive human voice”, IADIS Multi Conference Computer Science and Information Systems, Prague, Czech Republic, 2013.
[8] L. Malatesta, J. Murray, A. Raouzaiou, A. Hiolle, L. Cañamero and K. Karpouzis, “Emotion Modeling and Facial Affect Recognition in Human-Computer and Human-Robot Interaction”, Image, Video and Multimedia Systems Lab, National Technical University of Athens, and Adaptive Systems Research Group, School of Computer Science, University of Hertfordshire, 2009.
[9] Y. Miyakoshi and S. Kato, “Facial emotion detection considering partial occlusion of face using Bayesian network”, IEEE Symposium on Computer and Informatics, 2011, pp. 96-101.
[10] T. Vogt E. Andr´e and J. Wagner, “Automatic Recognition of Emotions from Speech: A Review of the Literature and Recommendations for Practical Realization”, Affect and Emotion in HCI, Springer-Verlag Berlin Heidelberg. LNCS 4868, 2008, pp. 75–91.
[11] O. Kwon, K. Chan, J. Hao and T. Lee, “Emotion Recognition by Speech Signals”. Proc. of Eurospeech, Geneva, September, 2003, pp. 125-128.
[12] P. Ekman, W.V. Friesen and J.C. Hager, “Facial Action Coding System Investigator’s Guide”, 2002.
[13] Ko K., Sim K.: Development of the Facial Feature Extraction and Emotion Recognition Method based on ASM and Bayesian Network. FUZZ-IEEE, Korea (2009)
[14] K. R. Scherer, “Vocal affect expression: A review and a model for future research”, Psychological Bulletin, vol. 99, 1986, pp.143-165.
[15] K.R. Scherer, R. Klaus, R. Banse, H.G. Wallbott and T. Goldbeck, “Vocal Cues in Emotion Encoding and Decoding. Motivation and Emotion”, 1991, pp. 123-148.
[16] Luxand Inc. Luxand SDK. Online. (accessed 2015).
[17] P. Boersma and Weenink, “PRAAT: doing photetics by computer” (Version 5.1.05). 2009. (accessed 2015).
[18] V. Kirandziska and N. Ackovska, “Sound features used in emotion classification”, The 9th International Conference for Informatics and Information Technology, Bitola, Macedonia, 2012, pp. 91-95.
[19] V. Kirandziska and N. Ackovska, “Finding Important Sound Features for Emotion Evaluation Classification”, IEEE Region 8 Conference EuroCon, Zagreb, Croatia, 2013.
[20] M. Kotsia, et al. “The enterface’05 audio-visual emotion database”, 2006.
[21] R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2008. URL (accessed 2015).