Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30238
Emotion Recognition Using Neural Network: A Comparative Study

Authors: Nermine Ahmed Hendy, Hania Farag


Emotion recognition is an important research field that finds lots of applications nowadays. This work emphasizes on recognizing different emotions from speech signal. The extracted features are related to statistics of pitch, formants, and energy contours, as well as spectral, perceptual and temporal features, jitter, and shimmer. The Artificial Neural Networks (ANN) was chosen as the classifier. Working on finding a robust and fast ANN classifier suitable for different real life application is our concern. Several experiments were carried out on different ANN to investigate the different factors that impact the classification success rate. Using a database containing 7 different emotions, it will be shown that with a proper and careful adjustment of features format, training data sorting, number of features selected and even the ANN type and architecture used, a success rate of 85% or even more can be achieved without increasing the system complicity and the computation time

Keywords: Neural Network, classification, Emotion recognition, Feature selection, features extraction

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4118


[1] C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. Lee, A. Kazemzadeh, et al., "Analysis of Emotion Recognition Using Facial Expressions, Speech and Multimodal Information.," in International Commission on Mathematical Instruction, 2004, pp. 205-211.
[2] S. Ramakrishnan, "Recognition of Emotion from Speech: A Review, Speech Enhancement, Modeling and Recognition- Algorithms and Applications," ISBN 978-953-51-0291-5, Hard cover, 138 pages, Publisher: InTech, Chapters published March 14, 2012 under CC BY 3.0 license DOI: 10.5772/2391, pp. 66-80.
[3] Zhongzhe Xiao, "Features Extraction and Selection for Emotional Speech Classification," in Advanced Video and Signal Based Surveillance, 2005. AVSS ,IEEE Conference, Ecully, France, Sept. 2005, pp. 411- 416.
[4] Margarita Kotti, Fabio Paternò, "Speaker-Independent Emotion Recognition Exploiting a Psychologically-Inspired Binary Cascade Classification Schema," in International Journal of Speech Technology, vol. 15, no. 2, June 2012, pp. 131-150.
[5] K. J. Patil,P. H. Zope,S. R. Suralkar, "Emotion Detection From Speech Using Mfcc & Gmm," International Journal of Engineering Research & Technology (IJERT), vol. 1, no. 9, November- 2012.
[6] Björn Schuller, Gerhard Rigoll, Manfred Lang, "Speech Emotion Recognition Combining Acoustic Features And Linguistic Information In A Hybrid Support Vector Machine - Belief Network Architecture," in Acoustics, Speech, and Signal Processing. (ICASSP '04). IEEE International Conference on, 2004, pp. I- 577-80.
[7] Mina Hamidi, Muharram Mansoorizade, "Emotion Recognition From Persian Speech With Neural Network," International Journal of Artificial Intelligence & Applications (IJAIA), vol. 3, no. 5, September 2012,p. 107.
[8] Constantine Kotropoulos, Dimitrios Ververidis, "Sequential Forward Feature Selection With Low Computational Cost," in European Signal processing conference, Turkey, 2005.
[9] Keshi Dai , Harriet J. Fell, Joel MacAuslan, "Recognizing Emotion In Speech Using Neural Networks," in Telehealth/AT '08 Proceedings of the IASTED International Conference on Telehealth/Assistive Technologies, ACTA Press Anaheim, CA, USA, 2008, pp. 31-36.
[10] Mehmet S. Unluturk, Kaya Oguz, Coskun Atay, "Emotion Recognition Using Neural Networks," in 10th WSEAS (World Scientific and Engineering Academy and Society ) international conference on Neural networks, USA, 2009, pp. 82-85.
[11] Yafei Sun, "Neural Networks for Emotion Classification," 2003.
[12] K B Khanchandani, Moiz A Hussain, , "Emotion Recognition Using Multilayer Perceptron and Generalized Feed Forward Neural Network," Journal of Scientific and Industrial Research (JSIR), vol. 68, no. 05, May 2009, pp. 367-371.
[13] A.A. Razak, R. Komiya, M. Izani, Z. Abidin, "Comparison Between Fuzzy and NN Method for Speech Emotion Recognition," in Information Technology and Applications, 2005. ICITA, 2005, vol 1, pp. 297- 302.
[14] Björn Schuller, Gerhard Rigoll, Manfred Lang, "Hidden Markov Model- Based Speech Emotion Recognition," in Multimedia and Expo, 2003. ICME '03. Proceedings, 2003, vol.1, pp. I-401-4.
[15] M. Murugappan, "Human Emotion Classification Using Wavelet Transform and KNN," in Pattern Analysis and Intelligent Robotics (ICPAIR), 2011, pp. 148-153.
[16] Dimitrios Ververidis, Constantine Kotropoulos, "Emotional speech Classification Using Gaussian Mixture Models," in Circuits and Systems. ISCAS 2005. IEEE International Symposium, 2005, pp. 2871- 2874.
[17] Chung-hsien Wu , Ze-jing Chuang, "Emotion Recognition Using IGbased Feature Compensation and Continuous Support Vector Machines," in International Journal of Computational Linguistics and Chinese Language Processing, vol.12, no. 1, 2007, pp.65-78.
[18] Jarosław Cichosz, Krzysztof Ślot, "Emotion Recognition in Speech Signal Using Emotion-Extracting Binary Decision Tree," in Polish State Fund for Research, 2008.
[19] Dimitrios Ververidis, Constantine Kotropoulos, "Emotional Speech Recognition: Resources, Features, and Methods" in Speech Communication, April 2006.
[20] A. Firoz Shah., A. Raji Sukumar, P. Babu Anto., "Discrete Wavelet Transforms and Artificial Neural Networks for Speech Emotion Recognition" in International Journal of Computer Theory and Engineering, vol. 2, no. 3, June 2010.
[21] A. Firoz Shah.,A. Raji Sukumar.,P. Babu Anto., "Automatic Emotion Recognition from Speech using Artificial Neural Networks with Gender- Dependent Databases," in International Conference on Advances in Computing, Control, and Telecommunication Technologies (IEEE), 2009.
[22] Dimitrios Ververidis, Constantine Kotropoulos, "Automatic Speech Classification to Five Emotional States Based on Gender Information," in European Signal Processing Conference (EUSIPCO), Austrilia, 2004, pp. 341-344.
[23] Institute for Speech and Communication Technical University. Berlin Database of Emotional Speech. (Online). http://pascal.kgw.tuberlin. de/emodb/index-1280.html
[24] Tomi Kinnunen, Haizhou Li, "An Overview of Text-Independent Speaker Recognition: from Features to Supervectors," in Speech Communication, vol. 52, no. 1, January 2010, pp. 12-40.
[25] L. Rabiner, "On the Use of Autocorrelation Analysis for Pitch Detection" in Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 25, no. 1, Feb 1977, pp. 24-33.
[26] W. J. Hess, Pitch Determination of Speech Signals: Algorithms and Devices.: Springer Series in Information Sciences, 1983 pp.415-460.
[27] D. Talkin, "A Robust Algorithm for Pitch Tracking (RAPT)," in Speech Coding and Synthesis, Elseiver Science, Amsterdam, 1995, pp. 495-518.
[28] M. J. Ross, H. L. Shaffer, A. Cohen, R. Freudberg, H. J. Manley, "Average Magnitude Difference Function Pitch Extractor," in Acoustics, Speech and Signal Processing, IEEE Transactions on , vol 22, no. 5, Oct.1974., pp. 353–362.
[29] A. M. Noll, "Cepstrum Pitch Determination," in Journal of the Acoustical Society of America, vol. 41, no. 2, 1967, pp. 293-309.
[30] L. Rabiner, M. J. Cheng, A. E. Rosenberg, C. A. McGonegal, "A Comparative Performance Study of Several Pitch Detection Algorithms," in IEEE Transactions on ASSP, vol. 24, 1976, pp. 399-417.
[31] H. Bořil, P. Pollák, "Direct Time Domain Fundamental Frequency Estimation of Speech in Noisy Conditions," in Proceeding of EUSIPCO2004, Wien, vol. 1, Austria, 2004, pp. 1003-1006.
[32] Tuomo Raitio, Antti Suni, Junichi Yamagishi, Hannu Pulakka, Jani Nurminen, Martti Vainio, Paavo Alku, "HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering," in Audio, Speech, and Language Processing, IEEE Transactions on, vol. 19, no. 1, January 2001, pp. 153-165.
[33] Alku, Paavo, "Glottal Wave Analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering," in Speech Communication - Eurospeech '91, vol. 11, no. 2-3, June 1992, pp. 109-118.
[34] Johannes Pittermann, Angela Pittermann, Wolfgang Minker, Handling Emotions in Human-Computer Dialogues.: Springer, 2009.
[35] Jia Rong, "Acoustic Features Extraction for Emotion Recognition," in Computer and Information Science, 6th IEEE/ACIS International Conference, Melbourne , July 2007, pp. 419- 424.
[36] N. Kwak, "Input Feature Selection for Classification Problems" in Neural Networks, IEEE Transactions, vol. 13, no. 1, Jan 2002, pp. 143- 159.
[37] Wauter Bosma, Elisabeth Andr E, "Exploiting Emotions to Disambiguate Dialogue Acts," in Intelligent User Interfaces - IUI , March 2004, pp. 85-92.
[38] Hao Tang ;Chu, S.M. ; Hasegawa-Johnson, M. ; Huang, T.S. , "Emotion Recognition from Speech VIA Boosted Gaussian Mixture Models," in Multimedia and Expo, 2009. ICME. IEEE , 2009, pp. 294-297.
[39] Jianping Huaa,Waibhav D.Tembeb, EdwardR.Dougherty, "Performance of Feature-Selection Methods in the Classification of High Dimension Data,"in Elsevier, vol. 42, no. 3, 2009, pp. 409-424.
[40] M. Soryani , N. Rafat, "Application of Genetic Algorithms to Feature Subset Selection in a Farsi OCR," in Proceedings of World Academy of Science, Engineering and Technology, 2006, pp. 113-116.
[41] Mineichi Kudo , Jack Sklansky, "Comparison of Algorithms that Select Features for Pattern Classifiers," Pattern Recognition-PR, vol. 33, no. 1, 2000, pp. 25-41.
[42] Delft Pattern Recognition Research, Faculty EWI - ICT, Delft University of Technology. PRTools, A Matlab toolbox for pattern recognition. (Online).
[43] Kevin L. Priddy, Paul E. Keller, Artificial Neural Networks: An Introduction, 1st ed.: SPIE Press, 2005,pp. 107-116.