Facial Expression Phoenix (FePh): An Annotated Sequenced Dataset for Facial and Emotion-Specified Expressions in Sign Language
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33156
Facial Expression Phoenix (FePh): An Annotated Sequenced Dataset for Facial and Emotion-Specified Expressions in Sign Language

Authors: Marie Alaghband, Niloofar Yousefi, Ivan Garibay

Abstract:

Facial expressions are important parts of both gesture and sign language recognition systems. Despite the recent advances in both fields, annotated facial expression datasets in the context of sign language are still scarce resources. In this manuscript, we introduce an annotated sequenced facial expression dataset in the context of sign language, comprising over 3000 facial images extracted from the daily news and weather forecast of the public tv-station PHOENIX. Unlike the majority of currently existing facial expression datasets, FePh provides sequenced semi-blurry facial images with different head poses, orientations, and movements. In addition, in the majority of images, identities are mouthing the words, which makes the data more challenging. To annotate this dataset we consider primary, secondary, and tertiary dyads of seven basic emotions of "sad", "surprise", "fear", "angry", "neutral", "disgust", and "happy". We also considered the "None" class if the image’s facial expression could not be described by any of the aforementioned emotions. Although we provide FePh as a facial expression dataset of signers in sign language, it has a wider application in gesture recognition and Human Computer Interaction (HCI) systems.

Keywords: Annotated Facial Expression Dataset, Sign Language Recognition, Gesture Recognition, Sequenced Facial Expression Dataset.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 741

References:


[1] World health organization, deafness and hearing loss, online, 2020.
[online]. available: https://www.who.int/healthtopics/hearing-loss.
[2] Joze, H. R. V., & Koller, O. (2018). Ms-asl: A large-scale data set and benchmark for understanding american sign language. arXiv preprint arXiv:1812.01053.
[3] Neiva, D. H., & Zanchettin, C. (2018). Gesture recognition: a review focusing on sign language in a mobile context. Expert Systems with Applications, 103, 159-183.
[4] Freitas, F. A., Peres, S. M., Lima, C. A., & Barbosa, F. V. (2017). Grammatical facial expression recognition in sign language discourse: a study at the syntax level. Information Systems Frontiers, 19(6), 1243-1259.
[5] Kumar, P., Roy, P. P., & Dogra, D. P. (2018). Independent bayesian classifier combination based sign language recognition using facial expression. Information Sciences, 428, 30-48.
[6] Koller, O., Forster, J., & Ney, H. (2015). Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers. Computer Vision and Image Understanding, 141, 108-125.
[7] Koller, O., Ney, H., & Bowden, R. (2016). Deep hand: How to train a cnn on 1 million hand images when your data is continuous and weakly labelled. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3793-3802).
[8] Kadous, M. W. (2002). Temporal classification: Extending the classification paradigm to multivariate time series. Kensington: University of New South Wales.
[9] Ronchetti, F., Quiroga, F., Estrebou, C. A., Lanzarini, L. C., & Rosete, A. (2016). LSA64: an Argentinian sign language dataset. In XXII Congreso Argentino de Ciencias de la Computación (CACIC 2016)..
[10] Wang, R. Y., & Popovi´c, J. (2009). Real-time hand-tracking with a color glove. ACM transactions on graphics (TOG), 28(3), 1-8.
[11] Pugeault, N., & Bowden, R. (2011, November). Spelling it out: Real-time ASL fingerspelling recognition. In 2011 IEEE International conference on computer vision workshops (ICCV workshops) (pp. 1114-1119). IEEE.
[12] Ren, Z., Yuan, J., & Zhang, Z. (2011, November). Robust hand gesture recognition based on finger-earth mover’s distance with a commodity depth camera. In Proceedings of the 19th ACM international conference on Multimedia (pp. 1093-1096).
[13] Ansari, Z. A., & Harit, G. (2016). Nearest neighbour classification of Indian sign language gestures using kinect camera. Sadhana, 41(2), 161-182.
[14] Zafrulla, Z., Brashear, H., Starner, T., Hamilton, H., & Presti, P. (2011, November). American sign language recognition with the kinect. In Proceedings of the 13th international conference on multimodal interfaces (pp. 279-286).
[15] Uebersax, D., Gall, J., Van den Bergh, M., & Van Gool, L. (2011, November). Real-time sign language letter and word recognition from depth data. In 2011 IEEE international conference on computer vision workshops (ICCV Workshops) (pp. 383-390). IEEE.
[16] Cihan Camgoz, N., Hadfield, S., Koller, O., Ney, H., & Bowden, R. (2018). Neural sign language translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7784-7793).
[17] Mehdi, S. A., & Khan, Y. N. (2002, November). Sign language recognition using sensor gloves. In Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP’02. (Vol. 5, pp. 2204-2206). IEEE.
[18] Forster, J., Schmidt, C., Hoyoux, T., Koller, O., Zelle, U., Piater, J. H., & Ney, H. (2012, May). RWTH-PHOENIX-Weather: A Large Vocabulary Sign Language Recognition and Translation Corpus. In LREC (Vol. 9, pp. 3785-3789).
[19] Martínez, A. M., Wilbur, R. B., Shay, R., & Kak, A. C. (2002, October). Purdue RVL-SLLL ASL database for automatic recognition of American Sign Language. In Proceedings. Fourth IEEE International Conference on Multimodal Interfaces (pp. 167-172). IEEE.
[20] Caselli, N. K., Sehyr, Z. S., Cohen-Goldberg, A. M., & Emmorey, K. (2017). ASL-LEX: A lexical database of American Sign Language. Behavior research methods, 49(2), 784-801.
[21] Kadous, W. (1995). Grasp: Recognition of Australian sign language using instrumented gloves.
[22] Kurakin, A., Zhang, Z., & Liu, Z. (2012, August). A real time system for dynamic hand gesture recognition with a depth sensor. In 2012 Proceedings of the 20th European signal processing conference (EUSIPCO) (pp. 1975-1979). IEEE.
[23] Escalera, S., Baró, X., Gonzalez, J., Bautista, M. A., Madadi, M., Reyes, M., ... & Guyon, I. (2014, September). Chalearn looking at people challenge 2014: Dataset and results. In European Conference on Computer Vision (pp. 459-473). Springer, Cham.
[24] Wan, J., Zhao, Y., Zhou, S., Guyon, I., Escalera, S., & Li, S. Z. (2016). Chalearn looking at people rgb-d isolated and continuous datasets for gesture recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 56-64).
[25] Kapuscinski, T., Oszust, M., Wysocki, M., & Warchol, D. (2015). Recognition of hand gestures observed by depth cameras. International Journal of Advanced Robotic Systems, 12(4), 36.
[26] Wilbur, R., & Kak, A. C. (2006). Purdue RVL-SLLL American sign language database.
[27] Von Agris, U., Knorr, M., & Kraiss, K. F. (2008, September). The significance of facial features for automatic sign language recognition. In 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition (pp. 1-6). IEEE.
[28] Barczak, A. L. C., Reyes, N. H., Abastillas, M., Piccio, A., & Susnjak, T. (2011). A new 2D static hand gesture colour image dataset for ASL gestures.
[29] Feng, B., He, F., Wang, X., Wu, Y., Wang, H., Yi, S., & Liu, W. (2016). Depth-projection-map-based bag of contour fragments for robust hand gesture recognition. IEEE Transactions on Human-Machine Systems, 47(4), 511-523.
[30] Athitsos, V., Neidle, C., Sclaroff, S., Nash, J., Stefan, A., Yuan, Q., & Thangali, A. (2008, June). The american sign language lexicon video dataset. In 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (pp. 1-8). IEEE.
[31] Shanableh, T., Assaleh, K., & Al-Rousan, M. (2007). Spatio-temporal feature-extraction techniques for isolated gesture recognition in Arabic sign language. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 37(3), 641-650.
[32] Forster, J., Schmidt, C., Koller, O., Bellgardt, M., & Ney, H. (2014, May). Extensions of the Sign Language Recognition and Translation Corpus RWTH-PHOENIX-Weather. In LREC (pp. 1911-1916).
[33] Oliveira, M., Chatbri, H., Ferstl, Y., Farouk, M., Little, S., O’Connor, N. E., & Sutherland, A. (2017). A dataset for irish sign language recognition.
[34] Hosoe, H., Sako, S., & Kwolek, B. (2017, May). Recognition of JSL finger spelling using convolutional neural networks. In 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA) (pp. 85-88). IEEE.
[35] Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010, June). The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In 2010 ieee computer society conference on computer vision and pattern recognition-workshops (pp. 94-101). IEEE.
[36] Pantic, M., Valstar, M., Rademaker, R., & Maat, L. (2005, July). Web-based database for facial expression analysis. In 2005 IEEE international conference on multimedia and Expo (pp. 5-pp). IEEE.
[37] Valstar, M., & Pantic, M. (2010, May). Induced disgust, happiness and surprise: an addition to the mmi facial expression database. In Proc. 3rd Intern. Workshop on EMOTION (satellite of LREC): Corpora for Research on Emotion and Affect (p. 65).
[38] Zhao, G., Huang, X., Taini, M., Li, S. Z., & PietikäInen, M. (2011). Facial expression recognition from near-infrared videos. Image and Vision Computing, 29(9), 607-619.
[39] Goodfellow, I. J., Erhan, D., Carrier, P. L., Courville, A., Mirza, M., Hamner, B., ... & Zhou, Y. (2013, November). Challenges in representation learning: A report on three machine learning contests. In International Conference on Neural Information Processing (pp. 117-124). Springer, Berlin, Heidelberg.
[40] Labelbox, "labelbox," online, 2019.
[online]. available: https://labelbox.com.
[41] de Almeida Freitas, F., Peres, S. M., de Moraes Lima, C. A., & Barbosa, F. V. (2014, May). Grammatical facial expressions recognition with machine learning. In The Twenty-Seventh International Flairs Conference.