Efficient DTW-Based Speech Recognition System for Isolated Words of Arabic Language
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33093
Efficient DTW-Based Speech Recognition System for Isolated Words of Arabic Language

Authors: Khalid A. Darabkh, Ala F. Khalifeh, Baraa A. Bathech, Saed W. Sabah

Abstract:

Despite the fact that Arabic language is currently one of the most common languages worldwide, there has been only a little research on Arabic speech recognition relative to other languages such as English and Japanese. Generally, digital speech processing and voice recognition algorithms are of special importance for designing efficient, accurate, as well as fast automatic speech recognition systems. However, the speech recognition process carried out in this paper is divided into three stages as follows: firstly, the signal is preprocessed to reduce noise effects. After that, the signal is digitized and hearingized. Consequently, the voice activity regions are segmented using voice activity detection (VAD) algorithm. Secondly, features are extracted from the speech signal using Mel-frequency cepstral coefficients (MFCC) algorithm. Moreover, delta and acceleration (delta-delta) coefficients have been added for the reason of improving the recognition accuracy. Finally, each test word-s features are compared to the training database using dynamic time warping (DTW) algorithm. Utilizing the best set up made for all affected parameters to the aforementioned techniques, the proposed system achieved a recognition rate of about 98.5% which outperformed other HMM and ANN-based approaches available in the literature.

Keywords: Arabic speech recognition, MFCC, DTW, VAD.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1074387

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4075

References:


[1] M. Al-Zabibi, "An Acoustic-Phonetic Approach in Automatic Arabic Speech Recognition," The British Library in Association with UMI, UK, 1990, http://hdl.handle.net/2134/6949.
[2] M. Alkhouli, "Alaswaat Alaghawaiyah," Daar Alfalah, Jordan, 1990 (in Arabic).
[3] M. Elshafei, "Toward an Arabic Text-to-Speech System," The Arabian Journal for Science and Engineering, vol. 16, no. 4B, pp. 565-83, October 1991.
[4] S.B. Davis, P. Mermelstein, "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, no.4, pp. 357-366, August 1980.
[5] Z. Hachkar, A. Farchi, B. Mounir, J. El Abbadi, "A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language," International Journal on Computer Science and Engineering, vol.3, no.3, pp.1002-1008, March 2011.
[6] Lindasalwa Muda, Mumtaj Begam, I. Elamvazuthi, "Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW)", Journal of Computing, vol. 2, no. 3, pp. 138-143, March 2010.
[7] Stan Salvador, Philip Chan, "Toward Accurate Dynamic Time Warping in Linear Time and Space", Intelligent Data Analysis Journal, vol. 11, no. 5, pp. 561-580, October 2007. TABLE I RECOGNITION RATES FOR DIFFERENT FEATURE SETS Tested Word (Arabic Writing) Transcription English Writing Approach#1: VAD+MFCC Approach#2: VAD+MFCC+Δ Approach#3: VAD+MFCC+Δ+ΔΔ _` ┘êϺ WAHID ONE 85.7% 100% 100% ┘åghi Ϻ ITHNAN TWO 100% 100% 100% Ziji THALATHA THREE 100% 100% 100% Z
[k ÏúÏ▒ ARBAA FOUR 100% 100% 100% Znop KHAMSA FIVE 100% 100% 100% Zq] SITTA SIX 85.7% 85.7% 85.7% Z
[\] SABAA SEVEN 100% 100% 100% ZQsgoi THAMANIYA EIGHT 100% 100% 100% Z
[nt TISAA NINE 100% 100% 100% Ï®vwS ASHRA TEN 85.7% 100% 100% ┘àjny Ϻ ASSALAAMU PEACE 100% 100% 100% OPQRS ALAIKUM UPON YOU 100% 100% 100% zQ Ïó KEEF HOW 100% 100% 100% {yg` HALAK ARE YOU 85.7% 85.7% 85.7% g| MA WHAT 100% 100% 100% {o] Ϻ ESMOK YOUR NAME 100% 100% 100% O Ïó KAM HOW 85.7% 85.7% 100% ┘âvoS OMROK YOUR AGE 100% 100% 100% {qh~| MEHNATOK YOUR OCCUPATION 100% 100% 100%
[8] D. Vergyri, K. Kirchhoff, K. Duh, A. Stolcke, "Morphology-based language modeling for Arabic speech recognition", In INTERSPEECH- 2004, pp. 2245-2248, 2004.
[9] K. Kirchho, J. Bilmes, J. Henderson, R. Schwartz, M. Noamany, P. Schone, G. Ji, S. Das, M. Egan, F. He, D. Vergyri, D. Liu, and N. Duta, "Novel Approaches to Arabic Speech Recognition," Technical Report, Johns-Hopkins University, 2002.
[10] D. Vergyri, K. Kirchhoff. "Automatic diacritization of Arabic for acoustic modeling in speech recognition", In Ali Farghaly and Karine Megerdoomian, editors, COLING 2004, Computational Approaches to Arabic Scriptbased Languages, pp. 66-73, Geneva, Switzerland, 2004.
[11] H. Satori, M. Harti, N. Chenfour, "Introduction to Arabic Speech Recognition Using CMUSphinx System," Proceedings of Information and Communication Technologies International Symposium (ICTIS'07), Fes, Morocco, pp. 139-115, July 2007.
[12] Lawrence Rabiner, Biing-Hwang Juang, Fundamentals of speech recognition, Upper Saddle River, New Jersey: Prentice Hall, USA, 1993
[13] X. Huang, A. Acero, and H.-W. Hon, Spoken Language Processing, Upper Saddle River, New Jersey: Prentice Hall, USA, 2001.
[14] B. Gold and N. Morgan, Speech and Audio Signal Processing, New York, New York: John Wiley and Sons, USA, 2000.
[15] Mikael Nilsson and Marcus Ejnarsson, "Speech Recognition using Hidden Markov Model (performance evaluation in noisy environment)", Masters Thesis, Department of Telecommunications and Signal Processing, Belkinge Institute of Technology, Ronneby, Sweden, March 2002.
[16] B.S. Jinjin Ye, "Speech Recognition Using Time Domain Features From Phase Space Reconstructions", Masters Thesis, Department of Electrical and Computer Engineering, Marquette University, Milwaukee, Wisconsin, May 2004.
[17] Khalid Saeed and Mohammad Nammous, Heuristic Method of Arabic Speech Recognition, Bialystok University of Technology, Poland, http://aragorn.pb.bialystok.pl/~zspinfo/
[18] Mohamed Mostafa Azmi, Hesham Tolba, Sherif Mahdy, Mervat Fashal, "Syllable-Based Automatic Arabic Speech Recognition", Proceedings of WSEAS International conference of Signal Processing, Robotics and Automation (ISPRA- 08), University of Cambridge, UK, pp. 246-250, February 2008.
[19] H. Bahi and M. Sellami, "Combination of Vector Quantization and Hidden Markov Models for Arabic Speech Recognition," Proceedings of the ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 2001), Beirut, Lebanon, pp: 96-100, June 2001.
[20] W. Alkhaldi, W. Fakhr, N. Hamdy, "Multi-Band Based Recognition of Spoken Arabic Numerals Using Wavelet Transform," Proceedings of the 19th National Radio Science Conference (NRSC-01), Alexandria University, Alexandria, Egypt, March 19-21, 2002.
[21] F.A. Elmisery, A.H. Khalil, A.E. Salama, H.F. Hammed, "A FPGA Based HMM for a Discrete Arabic Speech Recognition System," Proceedings of the 15th International Conference on Microelectronics (ICM 2003), Cairo, Egypt, December 9-11, 2003.