Voice Driven Applications in Non-stationary and Chaotic Environment
Authors: C. Kwan, X. Li, D. Lao, Y. Deng, Z. Ren, B. Raj, R. Singh, R. Stern
Abstract:
Automated operations based on voice commands will become more and more important in many applications, including robotics, maintenance operations, etc. However, voice command recognition rates drop quite a lot under non-stationary and chaotic noise environments. In this paper, we tried to significantly improve the speech recognition rates under non-stationary noise environments. First, 298 Navy acronyms have been selected for automatic speech recognition. Data sets were collected under 4 types of noisy environments: factory, buccaneer jet, babble noise in a canteen, and destroyer. Within each noisy environment, 4 levels (5 dB, 15 dB, 25 dB, and clean) of Signal-to-Noise Ratio (SNR) were introduced to corrupt the speech. Second, a new algorithm to estimate speech or no speech regions has been developed, implemented, and evaluated. Third, extensive simulations were carried out. It was found that the combination of the new algorithm, the proper selection of language model and a customized training of the speech recognizer based on clean speech yielded very high recognition rates, which are between 80% and 90% for the four different noisy conditions. Fourth, extensive comparative studies have also been carried out.
Keywords: Non-stationary, speech recognition, voice commands.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1072273
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1541References:
[1] B. R. Ramakrishnan, Recognition of Incomplete Spectrograms for Robust Speech Recognition, Ph.D. dissertation, Dept. Electrical and Computer Engineering, Carnegie Mellon University, 2000.
[2] M. L. Seltzer, B. Raj, and R. M. Stern, "Classifier-Based Mask Estimation for Missing Feature Methods of Robust Speech Recognition," Proc. of the International Conference of Spoken Language Processing, Beijing, China, October, 2000.
[3] S.V., Milner, B.P, "Noise-adaptive hidden Markov models based on Wiener filters", Proc. European Conf. Speech Technology, Berlin, Vol. II, pp.1023-1026, 1993.
[4] "Acoustical and Environmental Robustness in Automatic Speech Recognition". A. Acero. Ph. D.Dissertation, ECE Department, CMU, Sept. 1990.
[5] Singh, R., Stern, R.M. and Raj, B., "Signal and Feature Compensation Methods for Robust Speech Recognition," CRC Handbook on Noise Reduction in Speech Applications, Gillian Davis, Ed. CRC Press, 2002.
[6] Singh, R., Raj, B. and Stern, R.M., "Model Compensation and Matched Condition Methods for Robust Speech Recognition," CRC Handbook on Noise Reduction in Speech Applications, Gillian Davis, Ed. CRC Press, 2002.
[7] Nadas, A., Nahamoo, D. and Picheny, M.A, "Speech recognition using noise-adaptive prototypes", IEEE Trans. Acoust. Speech Signal Process. Vol.37, No. 10, pp-1495- 1502, 1989.
[8] Mansour, D. and Juang, B.H, "The short-time modified coherence representation and its application for noisy speech recognition", Proc. IEEE Int.. Conf. Acoust. Speech Signal Process., New York, April 1988.
[9] S. Chakrabartty, Y. Deng and G. Cauwenberghs, "Robust Speech Feature Extraction by Growth Transformation in Reproducing Kernel Hilbert Space," Proc. IEEE Int. Conf. Acoustics Speech and Signal Processing (ICASSP'2004), Montreal Canada, May 17-21, 2004.
[10] Ghitza, O., "Auditory nerve representation as a basis for speech processing", in Advances in Speech Signal Processing, ed. by S. Furui and M.M.Sondhi (Marcel Dekker, New York), Chapter 15, pp.453-485, 1992.
[11] H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech", J. Acoustic Soc. Am., vol. 87, no. 4, pp. 1738-1752, Apr. 1990.
[12] Y. Deng, S. Chakrabartty, and G. Cauwenberghs, "Analog Auditory Perception Model for Robust Speech Recognition," Proc. IEEE Int. Joint Conf. on Neural Network (IJCNN'2004), Budapest Hungary, July 2004.
[13] B. Raj et al., GRATZ Algorithm Summary, to be submitted.
[14] ''Sphinx-3 s3.3 Decoder", Mosur K. Ravishankar (aka Ravi Mosur), Sphinx Speech Group, CMU.
[15] Slava M. Katz, "Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer," in IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 35(3), pp. 400-401, March, 1987.
[16] Bhiksha Raj and Rita Singh, "Feature compensation with secondary sensor measurements for robust speech recognition," Proc. EUSIPCO 2005, Antalya, Turkey, August 2005.
[17] Bhiksha Raj, Rita Singh and Paris Smaragdis, "Recognizing speech from simultaneous speakers," Proc. INTERSPEECH 2005, Lisbon, Portugal, September 2005.
[18] H . Hermansky, and N. Morgan, "RASTA processing of speech", IEEE Trans. on Speech and Audio Proc., vol. 2, no. 4, pp. 578-589, Oct. 1994.
[19] http://www.btinternet.com/~a.c.walton/navy/smn-faq/smn2.htm
[20] http://spib.rice.edu/spib/data/signals/noise/