Investigation of Combined use of MFCC and LPC Features in Speech Recognition Systems

К. R. Aida–Zade; C. Ardil; S. S. Rustamov

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32804

Investigation of Combined use of MFCC and LPC Features in Speech Recognition Systems

Authors: К. R. Aida–Zade, C. Ardil, S. S. Rustamov

Abstract:

Statement of the automatic speech recognition problem, the assignment of speech recognition and the application fields are shown in the paper. At the same time as Azerbaijan speech, the establishment principles of speech recognition system and the problems arising in the system are investigated. The computing algorithms of speech features, being the main part of speech recognition system, are analyzed. From this point of view, the determination algorithms of Mel Frequency Cepstral Coefficients (MFCC) and Linear Predictive Coding (LPC) coefficients expressing the basic speech features are developed. Combined use of cepstrals of MFCC and LPC in speech recognition system is suggested to improve the reliability of speech recognition system. To this end, the recognition system is divided into MFCC and LPC-based recognition subsystems. The training and recognition processes are realized in both subsystems separately, and recognition system gets the decision being the same results of each subsystems. This results in decrease of error rate during recognition. The training and recognition processes are realized by artificial neural networks in the automatic speech recognition system. The neural networks are trained by the conjugate gradient method. In the paper the problems observed by the number of speech features at training the neural networks of MFCC and LPC-based speech recognition subsystems are investigated. The variety of results of neural networks trained from different initial points in training process is analyzed. Methodology of combined use of neural networks trained from different initial points in speech recognition system is suggested to improve the reliability of recognition system and increase the recognition quality, and obtained practical results are shown.

Keywords: Speech recognition, cepstral analysis, Voice activation detection algorithm, Mel Frequency Cepstral Coefficients, features of speech, Cepstral Mean Subtraction, neural networks, Linear Predictive Coding.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1314791

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 860

References:

[1] K.R.Ayda-zade, S.S.Rustamov. Research of Cepstral Coefficients for Azerbaijan speech recognition system. Transactions of Azerbaijan National Academy of sciences.”Informatics and control problems”. Volume XXV, №3. Baku, 2005, p.89-94.
[2] K.Р.Айда-заде, Э.Э.Мустафаев. Об оптимизации параметров нейронной сети на этапе ее обучения / Труды Республиканской научной конференции «Современные проблемы информатизации, кибернетики и информационных технологий», том I, Баку, 2003, с. 118-121.
[3] Mikael Nilsson,Marcus Ejnarsson. “Speech Recognition using Hidden Markov Model”.Department of Telecommunications and Speech Processing, Blekinge Institute of Technology. 2002. http://www.hh.se/staff/maej/publications/MSc Thesis - MiMa.pdf
[4] Group 622 “On Speaker Verification”. 2004. 198 p. http://www.control.auc.dk/~jhve02/report_inf6.pdf
[5] А.Б.Сергиенко. Цифровая обработка сигналов. СПб.: Питер, 2002, 608 с.
[6] ETSI ES 201 108 v1.1.2 (2000-04). “Speech Processing, Transmission and Quality aspects(STQ); distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms”. 20 p. http://www.3gpp.org/ftp/TSG_SA/TSG_SA/TSGS_13/docs/PDF/S P-010566.pdf
[7] Bengt Mandersson. Chapter 4. “Signal Modeling”.Department of Electroscience. Lund University. August 2005. http://www.tde.lth.se/ugradcourses/osb/osb05_f2_a4.pdf
[8] Bengt Mandersson. Chapter 5. “Levinson-Durbin Recursion”. Department of Electroscience. Lund University. September 2005. http://www.tde.lth.se/ugradcourses/osb/osb05_f3_a4.pdf
[9] Group 11. Tejaswini Hebalkar, Lee Hotraphinyo, Richard Tseng. “Voice Recognition and Identification System”. Digital communications and Signal Processing Systems Design. June 2000. http://www.ece.cmu.edu/~ee551/Final_Reports/Gr11.551.S00.pdf
[10] Bengt Mandersson. Chapter 4. “Signal Modeling”.Department of Electroscience. Lund University. August 2005. http://www.tde.lth.se/ugradcourses/osb/osb05_f2_a4.pdf
[11] Химмельблау Д. Прикладное нелинейное программирование. М.: Мир, 1975, 534 с.