Multimodal Database of Emotional Speech, Video and Gestures

Tomasz Sapiński; Dorota Kamińska; Adam Pelikant; Egils Avots; Cagri Ozcinar; Gholamreza Anbarjafari

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32804

Multimodal Database of Emotional Speech, Video and Gestures

Authors: Tomasz Sapiński, Dorota Kamińska, Adam Pelikant, Egils Avots, Cagri Ozcinar, Gholamreza Anbarjafari

Abstract:

People express emotions through different modalities. Integration of verbal and non-verbal communication channels creates a system in which the message is easier to understand. Expanding the focus to several expression forms can facilitate research on emotion recognition as well as human-machine interaction. In this article, the authors present a Polish emotional database composed of three modalities: facial expressions, body movement and gestures, and speech. The corpora contains recordings registered in studio conditions, acted out by 16 professional actors (8 male and 8 female). The data is labeled with six basic emotions categories, according to Ekman’s emotion categories. To check the quality of performance, all recordings are evaluated by experts and volunteers. The database is available to academic community and might be useful in the study on audio-visual emotion recognition.

Keywords: Body movement, emotion recognition, emotional corpus, facial expressions, gestures, multimodal database, speech.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1474737

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1068

References:

[1] A. T. Lopes, E. de Aguiar, A. F. De Souza, and T. Oliveira-Santos, “Facial expression recognition with convolutional neural networks: coping with few data and the training sample order,” Pattern Recognition, vol. 61, pp. 610–628, 2017.
[2] K. Zhang, Y. Huang, Y. Du, and L. Wang, “Facial expression recognition based on deep evolutional spatial-temporal networks,” IEEE Transactions on Image Processing, vol. 26, no. 9, pp. 4193–4203, 2017.
[3] F. Noroozi, T. Sapi´nski, D. Kami´nska, and G. Anbarjafari, “Vocal-based emotion recognition using random forests and decision tree,” International Journal of Speech Technology, vol. 20, no. 2, pp. 239–246, 2017.
[4] D. Kami´nska, T. Sapi´nski, and G. Anbarjafari, “Efficiency of chosen speech descriptors in relation to emotion recognition,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2017, no. 1, p. 3, 2017.
[5] P. Pławiak, T. So´snicki, M. Nied´zwiecki, Z. Tabor, and K. Rzecki, “Hand body language gesture recognition based on signals from specialized glove and machine learning algorithms,” IEEE Transactions on Industrial Informatics, vol. 12, no. 3, pp. 1104–1113, 2016.
[6] L. Kiforenko and D. Kraft, “Emotion recognition through body language using rgb-d sensor,” in 11th International Conference on Computer Vision Theory and ApplicationsComputer Vision Theory and Applications. SCITEPRESS Digital Library, 2016, pp. 398–405.
[7] R. Jenke, A. Peer, and M. Buss, “Feature extraction and selection for emotion recognition from eeg,” IEEE Transactions on Affective Computing, vol. 5, no. 3, pp. 327–339, 2014.
[8] S. Jerritta, M. Murugappan, K. Wan, and S. Yaacob, “Emotion recognition from facial emg signals using higher order statistics and principal component analysis,” Journal of the Chinese Institute of Engineers, vol. 37, no. 3, pp. 385–394, 2014.
[9] A. Greco, G. Valenza, L. Citi, and E. P. Scilingo, “Arousal and valence recognition of affective sounds based on electrodermal activity,” IEEE Sensors Journal, vol. 17, no. 3, pp. 716–725, 2017.
[10] R. Gupta, M. Khomami Abadi, J. A. Cárdenes Cabré, F. Morreale, T. H. Falk, and N. Sebe, “A quality adaptive multimodal affect recognition system for user-centric multimedia indexing,” in Proceedings of the 2016 ACM on international conference on multimedia retrieval. ACM, 2016, pp. 317–320.
[11] B. d. Gelder, “Why Bodies? Twelve Reasons for Including Bodily Expressions in Affective Neuroscience.” hilosophical Transactions of the Royal Society B: Biological Sciences, vol. 364, no. 364, p. 3475–3484, 2009.
[12] D. Efron, “Gesture and environment.” 1941.
[13] A. Kendon, “The study of gesture: Some remarks on its history,” in Semiotics 1981. Springer, 1983, pp. 153–164.
[14] B. Pease and A. Pease, The definitive book of body language. Bantam, 2004.
[15] L. Yin, X. Wei, Y. Sun, J. Wang, and M. J. Rosato, “A 3d facial expression database for facial behavior research,” in Automatic face and gesture recognition, 2006. FGR 2006. 7th international conference on. IEEE, 2006, pp. 211–216.
[16] L. Yin, X. Chen, Y. Sun, T. Worm, and M. Reale, “A high-resolution 3d dynamic facial expression database,” in Automatic Face and Gesture Recognition, 2008. FG08. 8th IEEE International Conference on. IEEE, 2008, pp. 1–6.
[17] A. Savran, N. Alyüz, H. Dibeklio˘glu, O. Çeliktutan, B. Gökberk, B. Sankur, and L. Akarun, “Bosphorus database for 3d face analysis,” in European Workshop on Biometrics and Identity Management. Springer, 2008, pp. 47–56.
[18] X. Zhang, L. Yin, J. F. Cohn, S. Canavan, M. Reale, A. Horowitz, P. Liu, and J. M. Girard, “Bp4d-spontaneous: a high-resolution spontaneous 3d dynamic facial expression database,” Image and Vision Computing, vol. 32, no. 10, pp. 692–706, 2014.
[19] G. Goswami, M. Vatsa, and R. Singh, “Rgb-d face recognition with texture and attribute features,” IEEE Transactions on Information Forensics and Security, vol. 9, no. 10, pp. 1629–1640, 2014.
[20] R. Hg, P. Jasek, C. Rofidal, K. Nasrollahi, T. B. Moeslund, and G. Tranchet, “An rgb-d database using microsoft’s kinect for windows for face detection,” in Signal Image Technology and Internet Based Systems (SITIS), 2012 Eighth International Conference on. IEEE, 2012, pp. 42–46.
[21] R. Min, N. Kose, and J.-L. Dugelay, “Kinectfacedb: A kinect database for face recognition,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 44, no. 11, pp. 1534–1548, 2014.
[22] I. Lüsi, S. Escarela, and G. Anbarjafari, “Sase: Rgb-depth database for human head pose estimation,” in Computer Vision–ECCV 2016 Workshops. Springer, 2016, pp. 325–336.
[23] F. Noroozi, C. A. Corneanu, D. Kami´nska, T. Sapi´nski, S. Escalera, and G. Anbarjafari, “Survey on emotional body gesture recognition,” arXiv preprint arXiv:1801.07481, 2018.
[24] A. Psaltis, K. Kaza, K. Stefanidis, S. Thermos, K. C. Apostolakis, K. Dimitropoulos, and P. Daras, “Multimodal affective state recognition in serious games applications,” in Imaging Systems and Techniques (IST), 2016 IEEE International Conference on. IEEE, 2016, pp. 435–439.
[25] H. Ranganathan, S. Chakraborty, and S. Panchanathan, “Multimodal emotion recognition using deep learning architectures,” in 2016 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2016, pp. 1–9.
[26] E. Douglas-Cowie, R. Cowie, and M. Schröder, “A new emotion database: considerations, sources and scope,” in ISCA tutorial and research workshop (ITRW) on speech and emotion, 2000.
[27] F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, and B. Weiss, “A database of german emotional speech,” in Ninth European Conference on Speech Communication and Technology, 2005.
[28] P. Ekman, “Universal and cultural differences in facial expression of emotion,” Nebr. Sym. Motiv., vol. 19, pp. 207–283, 1971.
[29] J. Russell and A. Mehrabian, “Evidence for a three-factor theory of emotions,” J. Research in Personality, vol. 11, pp. 273–294, 1977.
[30] R. Plutchik, “The nature of emotions human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice,” American scientist, vol. 89, no. 4, pp. 344–350, 2001.
[31] L. A. Camras, H. Oster, J. J. Campos, K. Miyake, and D. Bradshaw, “Japanese and american infants’ responses to arm restraint.” Developmental Psychology, vol. 28, no. 4, p. 578, 1992.
[32] M. Gavrilescu, “Recognizing emotions from videos by studying facial expressions, body postures and hand gestures,” in Telecommunications Forum Telfor (TELFOR), 2015 23rd. IEEE, 2015, pp. 720–723.
[33] T. Baltrušaitis, D. McDuff, N. Banda, M. Mahmoud, R. El Kaliouby, P. Robinson, and R. Picard, “Real-time inference of mental states from facial expressions and upper body gestures,” in Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on. IEEE, 2011, pp. 909–914.