Automatic Recognition of Emotionally Coloured Speech
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33122
Automatic Recognition of Emotionally Coloured Speech

Authors: Theologos Athanaselis, Stelios Bakamidis, Ioannis Dologlou

Abstract:

Emotion in speech is an issue that has been attracting the interest of the speech community for many years, both in the context of speech synthesis as well as in automatic speech recognition (ASR). In spite of the remarkable recent progress in Large Vocabulary Recognition (LVR), it is still far behind the ultimate goal of recognising free conversational speech uttered by any speaker in any environment. Current experimental tests prove that using state of the art large vocabulary recognition systems the error rate increases substantially when applied to spontaneous/emotional speech. This paper shows that recognition rate for emotionally coloured speech can be improved by using a language model based on increased representation of emotional utterances.

Keywords: Statistical language model, N-grams, emotionallycoloured speech

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1056046

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1622

References:


[1] K. Cummings, and M. Clements, Analysis of the glottal excitation of emotionally styled and stressed speech. JASA, 98 (1), pp 88-98, 1995.
[2] H.J.M. Steeneken, and J.H.L. Hansen, Speech Under Stress Conditions:Overview of the Effect of Speech Production and on System Performance. IEEE ICASSP-99: Inter. Conf. on Acoustics, Speech, and Signal Processing 4, pp 2079-2082, 1999.
[3] R. Cowie, and R. Cornelius, Describing the Emotional States that are Expressed in Speech. Speech Communication, 40, pp 5-32, 2003.
[4] D.J. Litman, J.B. Hirschberg, and M. Swerts, Predicting Automatic Speech Recognition Performance Using Prosodic Cues. Proceedings of ANLP-NAACL, pp. 218-225, 2000.
[5] C.E. Williams, K.N. Stevens, Emotions and speech: Someacoustical correlates. J. Acoust. Soc. Amer. 52, pp 1238-1250, 1972.
[6] S.T. Polzin, and A. Waibel, Pronunciation variations in emotional speech. In H. Strik, J. M. Kessens & M. Wester (Eds.) Modeling Pronunciation Variation for Automatic Speech Recognition. Proc. of the ESCA Workshop, 1998, pp. 103-108.
[7] S.J. Young, Large Vocabulary Continuous Speech Recognition.IEEE Signal Processing Magazine 13, (5), pp 45-57, 1996.
[8] T. Athanaselis, S. Bakamidis, I. Dologlou, R. Cowie, E. Douglas-Cowie, and C. Cox, "ASR for emotional speech: clarifying the issues and enhancing performance", Neural Networks Elsevier Publications, Volume 18, Issue 4, pp 437- 444, 2005.
[9] C. Whissell, "The dictionary of affect in language". In R. Plutchnik & H. Kellerman (Eds.) Emotion: Theory and research. New York, Harcourt Brace, pp. 113-131, 1989.
[10] ERMIS FP5 IST Project http://manolito.image.ece.ntua.gr/ermis/
[11] EC HUMAINE project (http://www.emotion-research.net).
[12] R. Cowie, E. Douglas-Cowie, S. Savvidou, E. McMahon, M. Sawey, M.Schröder, 'Feeltrace': An instrument for recording perceived emotion in real time. In E. Douglas-Cowie, R. Cowie & M. Schröder (Eds.) Proceedings of the ISCA Workshop on Speech and Emotion: A Conceptual Framework for Research, Belfast, pp.19-24, 2000.
[13] E. Douglas-Cowie, et al. Multimodal data in action and interaction:a library of recordings and labelling schemes HUMAINE report D5d http://emotion-research.net/deliverables/ 2003.