Bidirectional Dynamic Time Warping Algorithm for the Recognition of Isolated Words Impacted by Transient Noise Pulses
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32797
Bidirectional Dynamic Time Warping Algorithm for the Recognition of Isolated Words Impacted by Transient Noise Pulses

Authors: G. Tamulevičius, A. Serackis, T. Sledevič, D. Navakauskas

Abstract:

We consider the biggest challenge in speech recognition – noise reduction. Traditionally detected transient noise pulses are removed with the corrupted speech using pulse models. In this paper we propose to cope with the problem directly in Dynamic Time Warping domain. Bidirectional Dynamic Time Warping algorithm for the recognition of isolated words impacted by transient noise pulses is proposed. It uses simple transient noise pulse detector, employs bidirectional computation of dynamic time warping and directly manipulates with warping results. Experimental investigation with several alternative solutions confirms effectiveness of the proposed algorithm in the reduction of impact of noise on recognition process – 3.9% increase of the noisy speech recognition is achieved.

Keywords: Transient noise pulses, noise reduction, dynamic time warping, speech recognition.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1092120

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1858

References:


[1] L. Deng, X. Huang, "Challenges in adopting speech recognition”, Communications of ACM, vol. 47(1), ACM, New York, pp. 69–75, 2004.
[2] G. Čeidaitė, L. Telksnys, "Analysis of factors influencing accuracy of speech recognition”, Electronics and Electrical Engineering, no. 9(105), pp. 69–72, 2010.
[3] Ch.-P. Chen, Noise robustness in automatic speech recognition, Ph. D. thesis, University of Washington, 2004.
[4] J. Benesty, S. Makino, J. Chen, Speech enhancement, Berlin: Springer-Verlag, 2005.
[5] M. Seltzer, M. Microphone array processing for robust speech recognition, Ph. D. thesis, Carnegie Mellon University, Pittsburgh, 2003.
[6] R. Talmon, I. Cohen, and Sh. Gannot, "Transient noise reduction using nonlocal diffusion filters”, IEEE Trans. Audio, Speech, and Language Processing, vol. 19(6), pp. 1584 – 1599, 2011.
[7] R. Gomez and T. Kawahara, "Optimizing spectral subtraction and Wiener filtering for robust speech recognition in reverberant and noisy conditions”, in Proc. of ICASSP, pp. 4566–4569, 2010.
[8] H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech”, Journal of Acoustical Society of America, vol. 87(4), pp. 1738–1752, 1990.
[9] D.-S. Kim, S.-Y. Lee, and R. M. Kil, "Auditory processing of speech signals for robust speech recognition in real-world noisy environments”, IEEE Trans.Speech and Audio Processing, vol. 7(1), pp. 55–69, 1999.
[10] Y. Shao, Zh. Jin, D. Wang, and S. Srinivasan, "An auditory-based feature for robust speech recognition”, in Proc. of ICASSP, pp. 4625–4628, 2009.
[11] Ch. Kim and R. M. Stern, "Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction”, in INTERSPEECH 2010, pp. 2058–2061, 2010.
[12] D. Yu, L. Deng, J. Droppo, J. Wu, Y. Gong, and A. Acero, „A Minimum-mean-square-error noise reduction algorithm on mel-frequency cepstra for robust speech recognition”, in Proc. of ICASSP, pp. 4041–4044, 2008.
[13] M. Fujimoto, S. Watanabe, and T. Nakatani, "Non-stationary noise estimation method on bias-residual component decomposition for robust speech recognition”, in Proc. of ICASSP, pp. 4816–4819, 2011.
[14] S.V. Vaseghi, Advanced digital signal processing and noise reduction, New York: Wiley, 2006.
[15] H. Sakoe and S. Chiba, "Dynamic programming algorithm optimization for spoken word recognition”, IEEE Trans.Speech and Audio Processing, vol. 26(1), pp. 43–49, 1978.
[16] L. Rabiner and B.-H. Juang Fundamentals of speech recognition, New Jersey: Prentice-Hall, 1993.
[17] T. Sledevič, D. Navakauskas, "FPGA based fast Lithuanian isolated word recognition system”, in Proc. of EUROCON 2012, pp. 1630–1636, 2013.
[18] T. Sledevič, G. Tamulevičius, D. Navakauskas, "Upgrading FPGA implementation of isolated word recognition system for a real-time operation”, Electronics and Electrical Engineering, no. 10(19), pp. 123–128, 2013.