The Capacity of Mel Frequency Cepstral Coefficients for Speech Recognition

Fawaz S. Al-Anzi; Dia AbuZeina

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32799

The Capacity of Mel Frequency Cepstral Coefficients for Speech Recognition

Authors: Fawaz S. Al-Anzi, Dia AbuZeina

Abstract:

Speech recognition is of an important contribution in promoting new technologies in human computer interaction. Today, there is a growing need to employ speech technology in daily life and business activities. However, speech recognition is a challenging task that requires different stages before obtaining the desired output. Among automatic speech recognition (ASR) components is the feature extraction process, which parameterizes the speech signal to produce the corresponding feature vectors. Feature extraction process aims at approximating the linguistic content that is conveyed by the input speech signal. In speech processing field, there are several methods to extract speech features, however, Mel Frequency Cepstral Coefficients (MFCC) is the popular technique. It has been long observed that the MFCC is dominantly used in the well-known recognizers such as the Carnegie Mellon University (CMU) Sphinx and the Markov Model Toolkit (HTK). Hence, this paper focuses on the MFCC method as the standard choice to identify the different speech segments in order to obtain the language phonemes for further training and decoding steps. Due to MFCC good performance, the previous studies show that the MFCC dominates the Arabic ASR research. In this paper, we demonstrate MFCC as well as the intermediate steps that are performed to get these coefficients using the HTK toolkit.

Keywords: Speech recognition, acoustic features, Mel Frequency Cepstral Coefficients.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1132455

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1920

References:

[1] https://cmusphinx.github.io/wiki/faq/ Accessed on 31 March 2017.
[2] Young, Steve, et al. "The HTK book (for HTK version 3.4)." Cambridge university engineering department 2.2 (2006): 2-3.
[3] Povey, Daniel, et al. "The Kaldi speech recognition toolkit."IEEE 2011 workshop on automatic speech recognition and understanding. No. EPFL-CONF-192584. IEEE Signal Processing Society, 2011.
[4] Hermansky, Hynek. "Perceptual linear predictive (PLP) analysis of speech." The Journal of the Acoustical Society of America 87.4 (1990): 1738-1752.
[5] Haraty, Ramzi A., and Omar El Ariss. "CASRA+: a colloquial Arabic speech recognition application." American Journal of Applied Sciences 4.1 (2007): 23-32.
[6] Sharma, Davinder Pal, and Jamin Atkins. "Automatic speech recognition systems: challenges and recent implementation trends." International Journal of Signal and Imaging Systems Engineering 7.4 (2014): 220-234.
[7] Molau, Sirko, et al. "Computing mel-frequency cepstral coefficients on the power spectrum." Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP'01). 2001 IEEE International Conference on. Vol. 1. IEEE, 2001.
[8] Benzeghiba, Mohamed, et al. "Automatic speech recognition and speech variability: A review." Speech communication 49.10 (2007): 763-786.
[9] Ramsay, Allan. “How Do Speech Recognisers Work?” A presentation. Kuwait Univeristy (2016).
[10] http://www.thefreedictionary.com/ Accessed on 31 March 2017.
[11] Jurafsky, Dan. Speech & language processing. Pearson Education India, 2000.
[12] Forsberg, Markus. "Why is speech recognition difficult." Chalmers University of Technology (2003).
[13] Alcaraz Meseguer, Noelia. Speech analysis for automatic speech recognition. MS thesis. Institutt for elektronikk og telekommunikasjon, 2009.
[14] Huang, Xuedong, et al. Spoken language processing: A guide to theory, algorithm, and system development. Prentice hall PTR, 2001.
[15] Bahi, Halima, and Mokhtar Sellami. "Combination of vector quantization and hidden Markov models for Arabic speech recognition." Computer Systems and Applications, ACS/IEEE International Conference on. 2001. IEEE, 2001.
[16] Elmisery, F. A., et al. "A FPGA-based HMM for a discrete Arabic speech recognition system." Microelectronics, 2003. ICM 2003. Proceedings of the 15th International Conference on. IEEE, 2003.
[17] Amrouche, Abderrahmane, and J. Michel Rouvaen. "Arabic isolated word recognition using general regression neural network." Circuits and Systems, 2003 IEEE 46th Midwest Symposium on. Vol. 2. IEEE, 2003.
[18] Bourouba, H., et al. "New hybrid system (supervised classifier/HMM) for isolated Arabic speech recognition." Information and Communication Technologies, 2006. ICTTA'06. 2nd. Vol. 1. IEEE, 2006.
[19] Satori, Hassan, Mostafa Harti, and Nouredine Chenfour. "Introduction to Arabic speech recognition using CMUSphinx system." arXiv preprint arXiv:0704.2083 (2007).
[20] Essa, E. M., A. S. Tolba, and S. Elmougy. "A comparison of combined classifier architectures for Arabic Speech Recognition." Computer Engineering & Systems, 2008. ICCES 2008. International Conference on. IEEE, 2008.
[21] Azmi, M., et al. "Syllable-based automatic arabic speech recognition in noisy-telephone channel." WSEAS Transactions on Signal Processing 4.4 (2008): 211-220.
[22] Satori, Hassan, et al. "Investigation arabic speech recognition using CMU sphinx system." Int. Arab J. Inf. Technol. 6.2 (2009): 186-190.
[23] Alghamdi, Mansour, Moustafa Elshafei, and Husni Al-Muhtaseb. "Arabic broadcast news transcription system." International Journal of Speech Technology 10.4 (2007): 183-195.
[24] Alotaibi, YousefAjami, Sid-Ahmed Selouani, and Douglas O'shaughnessy. "Experiments on automatic recognition of nonnative Arabic speech." EURASIP Journal on Audio, Speech, and Music Processing 2008.1 (2008): 679831.
[25] Selouani, Sid Ahmed, and Malika Boudraa. "Algerian Arabic speech database (ALGASD): corpus design and automatic speech recognition application." Arabian Journal for Science and Engineering 35.2C (2010): 158.
[26] Abu Zeina, Dia, et al. "Toward enhanced Arabic speech recognition using part of speech tagging. "International Journal of Speech Technology 14.4 (2011): 419-426.
[27] Abu Zeina, Dia, et al. "Cross-word Arabic pronunciation variation modeling for speech recognition. "International Journal of Speech Technology 14.3 (2011): 227-236.
[28] Abu Zeina, Dia, et al. "Within-word pronunciation variation modeling for Arabic ASRs: a direct data-driven approach." International Journal of Speech Technology 15.2 (2012): 65-75.
[29] Abushariah, Mohammad Abd-Alrahman Mahmoud, et al. "Arabic speaker-independent continuous automatic speech recognition based on a phonetically rich and balanced speech corpus." International Arab Journal of Information Technology (IAJIT) 9.1 (2012): 84-93.
[30] Al-Anzi, Fawaz S., and Dia AbuZeina. "The impact of phonological rules on Arabic speech recognition." International Journal of Speech Technology (2017): 1-9.