Evaluation of Features Extraction Algorithms for a Real-Time Isolated Word Recognition System
Authors: Tomyslav Sledevič, Artūras Serackis, Gintautas Tamulevičius, Dalius Navakauskas
Abstract:
Paper presents an comparative evaluation of features extraction algorithm for a real-time isolated word recognition system based on FPGA. The Mel-frequency cepstral, linear frequency cepstral, linear predictive and their cepstral coefficients were implemented in hardware/software design. The proposed system was investigated in speaker dependent mode for 100 different Lithuanian words. The robustness of features extraction algorithms was tested recognizing the speech records at different signal to noise rates. The experiments on clean records show highest accuracy for Mel-frequency cepstral and linear frequency cepstral coefficients. For records with 15 dB signal to noise rate the linear predictive cepstral coefficients gives best result. The hard and soft part of the system is clocked on 50 MHz and 100 MHz accordingly. For the classification purpose the pipelined dynamic time warping core was implemented. The proposed word recognition system satisfy the real-time requirements and is suitable for applications in embedded systems.
Keywords: Isolated word recognition, features extraction, MFCC, LFCC, LPCC, LPC, FPGA, DTW.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1089188
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3543References:
[1] G. Tamuleviˇcius, "Isolated word recognition systems implementation,” Ph.D. dissertation, Vilnius Gedinimas Technical University, Vilnius, May 2008. (Online). Available: http://www.mii.lt/files/mii dis 08 tamulevicius.pdf
[2] G. ˇ Ceidait˙e and L. Telksnys, "Analysis of Factors Influencing Accuracy of Speech Recognition,” Electronics and Electrical Engineering, vol. 105, no. 9, pp. 69–72, Nov. 2010.
[3] R. Leleikyt˙e and L. Telksnys, "Quality Estimation Methodology of Speech Recognition Features,” Electronics and Electrical Engineering, vol. 110, pp. 113–116, May 2011.
[4] R. Maskeli¯unas and A. Esposito, "Multilingual Italian-Lithuanian Small Vocabulary Speech Recognition via Selection of Phonetic Transcriptions,” Electronics and Electrical Engineering, vol. 121, pp. 85–88, Jun. 2012.
[5] K. A. Darabkh, A. F. Khalifeh, B. A. Bathech, and S. W. Sabah, "Efficient DTW-Based Speech Recognition System for Isolated Words of Arabic Language,” World Academy of Science, Engineering and Technology, vol. 77, pp. 85–88, May 2012.
[6] O. C. Ai, M. Hariharan, S. Yaacob, and L. S. Chee, "Classification of Speech Dysfluencies with MFCC and LPCC Features,” Expert Systems with Applications, vol. 39, no. 2, pp. 2157–2165, Feb. 2012.
[7] X. Zhou, D. Garcia-Romero, R. Duraiswami, C. Espy-Wilson, and S. Shamma, "Linear versus Mel Frequency Cepstral Coefficients for Speaker Recognition,” in Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on, Dec. 2011, pp. 559–564.
[8] N. V. Vu, J. Whittington, H. Ye, and J. Devlin, "Implementation of the MFCC Front-End for Low-Cost Speech Recognition Systems,” in Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS 2010), Jun. 2010, pp. 2334–2337.
[9] M. Staworko and M. Rawski, "FPGA Implementation of Feature Extraction Algorithm for Speaker Verification,” in Proceedings of the 17th International Conference on Mixed Design of Integrated Circuits and Systems (MIXDES 2010), Jun. 2010, pp. 557–561.
[10] Y. Yujin, Z. Peihua, and Z. Qun, "Research of speaker recognition based on combination of LPCC and MFCC,” in IEEE International Conference on Intelligent Computing and Intelligent Systems (ICIS 2010), vol. 3, Oct. 2010, pp. 765–767.
[11] C. Y. Fook, M. Hariharan, S. Yaacob, and A. Ah, "Malay speech recognition in normal and noise condition,” in IEEE 8th International Colloquium on Signal Processing and its Applications (CSPA 2012), Mar. 2012, pp. 409–412.
[12] T. Wijoyo and S. Wijoyo, "Speech Recognition Using Linear Predictive Coding and Artificial Neural Network for Controlling Movement of Mobile Robot,” in International Conference on Information and Electronics Engineering IPCSIT, vol. 6, 2011, pp. 179–183.
[13] J. Xu, A. Ariyaeeinia, and R. Sotudeh, "Migrate Levinson-Durbin based Linear Predictive Coding algorithm into FPGAs,” in 12th IEEE International Conference on Electronics, Circuits and Systems (ICECS 2005), Dec. 2005, pp. 1–4.
[14] M. Atri, F. Sayadi, W. Elhamzi, and R. Tourki, "Efficient Hardware/Software Implementation of LPC Algorithm in Speech Coding Applications,” Journal of Signal and Information Processing, vol. 3, no. 9, pp. 122–129, 2012.
[15] T. Sledevic and D. Navakauskas, "FPGA-Based Fast Lithuanian Isolated Word Recognition System,” in EUROCON, 2013 IEEE, Jul. 2013, pp. 1630–1636.
[16] G. Zhang, J. Yin, Q. Liu, and C. Yang, "A Real-Time Speech Recognition System Based on the Implementation of FPGA,” in Cross Strait Quad-Regional Radio Science and Wireless Technology Conference (CSQRWC 2011), vol. 2, Jul 2011, pp. 1375–1378.
[17] S. T. Pan and X. Y. Li, "An FPGA-Based Embedded Robust Speech Recognition System Designed by Combining Empirical Mode Decomposition and a Genetic Algorithm,” IEEE Transactions on Instrumentation and Measurement, vol. 61, pp. 2560–2572, Sep 2012.
[18] D. Sart, A. Mueen, W. Najjar, E. Keogh, and V. Niennattrakul, "Accelerating Dynamic Time Warping Subsequence Search with GPUs and FPGAs,” in IEEE 10th International Conference on Data Mining (ICDM 2010), Dec 2010, pp. 1001–1006.
[19] Y. Zhang, K. Adl, and J. Glass, "Fast spoken query detection using lower-bound Dynamic Time Warping on Graphical Processing Units,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2012), Sep 2012, pp. 5173–5176.
[20] O. Cheng, W. Abdulla, and Z. Salcic, "HardwareSoftware Codesign of Automatic Speech Recognition System for Embedded Real-Time Applications,” IEEE Transactions on Industrial Electronics, vol. 58, 2011.