Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30184
Using HMM-based Classifier Adapted to Background Noises with Improved Sounds Features for Audio Surveillance Application

Authors: Asma Rabaoui, Zied Lachiri, Noureddine Ellouze


Discrimination between different classes of environmental sounds is the goal of our work. The use of a sound recognition system can offer concrete potentialities for surveillance and security applications. The first paper contribution to this research field is represented by a thorough investigation of the applicability of state-of-the-art audio features in the domain of environmental sound recognition. Additionally, a set of novel features obtained by combining the basic parameters is introduced. The quality of the features investigated is evaluated by a HMM-based classifier to which a great interest was done. In fact, we propose to use a Multi-Style training system based on HMMs: one recognizer is trained on a database including different levels of background noises and is used as a universal recognizer for every environment. In order to enhance the system robustness by reducing the environmental variability, we explore different adaptation algorithms including Maximum Likelihood Linear Regression (MLLR), Maximum A Posteriori (MAP) and the MAP/MLLR algorithm that combines MAP and MLLR. Experimental evaluation shows that a rather good recognition rate can be reached, even under important noise degradation conditions when the system is fed by the convenient set of features.

Keywords: Sounds recognition, HMM classifier, Multi-style training, Environmental Adaptation, Feature combinations.

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1285


[1] C. Couvreur, "Environmental Sound Recognition: A Statistical Approach," Ph.D. dissertation, Faculte Polytechnique de Mons, Belgium, June 1997.
[2] V. Peltonen, "Computational auditory scene recognition," Ph.D. dissertation, Tampere University of Technology, Finland, 2001.
[3] D. Istrate, "D'etection et reconnaissance des sons pour la surveillance m'edicale," Ph.D. dissertation, INPG, France, Dec. 2003.
[4] K. El-Maleh, "Frame level noise classification in mobile environments," Ph.D. dissertation, McGill University, Montreal, Canada, Jan. 2004.
[5] R. S. Goldhor, "Recognition of environmental sounds," in ICASSP, vol. 1, New York, USA, 1993, pp. 149-152.
[6] B. Uvacek, H. Ye, and G. Moschytz, "A new strategy for tactile hearing aids: tactile identification of preclassified signals (tips)," in International Conference on Acoustic, Speech and Signal Processing (ICASSP), New- York, USA, May 1988.
[7] A. K. S. Oberle, "Recognition of acoustical alarm signals for the profoundly deaf using hidden markov models," in International Symposium on Circuits and Systems, vol. 1, Seattle, USA, 1995, pp. 2285-2288.
[8] J. A. Osuna and G. S. Moschytz, "Recognition of acoustical alarm signals with cellular networks," in European Conference on Circuit Theory and Design, Istanbul, Turkey, 1995.
[9] M. J. Paradie and S. Nawab, "Classification of ringing sounds," in ICASSP, Apr. 1990.
[10] R. H. Cabell, C. Fuller, and W. O-Brien, "Identification of Helicopter noise Using a Neural Network," AIAA Journal, vol. 30, no. 3, pp. 624- 630, Mar. 1992.
[11] A. Eronen and A. Klapuri, "Musical instrument recognition using cepstral coefficients and temporal features," in ICASSP, Istanbul, Turkey, 2000, pp. 753-756.
[12] H. Soltau, T. Schultz, and M. Westphal, "Recognition of music types," in ICASSP, Seattle, WA, 1998.
[13] A. Dufaux, "Detection and recognition of Impulsive Sounds Signals," Ph.D. dissertation, Facult'e des sciences de l-Universit'e de Neuchˆatel, Switzerland, 2001.
[14] A. Bregman, Auditory scene analysis. Cambridge, USA: MIT Press, 1990.
[15] K. D. Martin, "Sound-source recognition: A theory and computational model," Ph.D. dissertation, MIT Press, 1999.
[16] A. Klapuri and M. Davy, Eds., Signal Processing Methods for Music Transcription. New York: Springer, 2006.
[17] M. Orr, D. Pham, B. Lithgow, and R. Mahony, "Speech perception based algorithm for the separation of overlapping speech signal," in The Seventh Australian and New Zealand Intelligent Information Systems Conference, 2001.
[18] M. Cowling, "Non-speech environmental sound classification system for autonomous surveillance," Ph.D. dissertation, Faculty of Engineering and Information Technology, Griffith University, 2004.
[19] M. Cowling and R. Sitte, "Recognition of environmental sounds using speech recognition techniques," Advanced Signal Processing for Communications Systems, 2002.
[20] ÔÇöÔÇö, "Comparison of techniques for environmental sound recognition," Pattern Recognition Letters, vol. 24, pp. 2895-2907, 2003.
[21] Y. Gong, "Speech recognition in noisy environments: A survey," Speech Communication, vol. 16, pp. 261-291, 1995.
[22] C. H. Lee, "On stochastic feature and model compensation approaches to robust speech recognition," Speech Communication, vol. 25, pp. 29-47, 1998.
[23] ÔÇöÔÇö, "Adaptive classification and decision strategies for robust speech recognition," in Workshop on Robust Methods Speech Recognition Adverse Conditions, Tempere, Finland, May 1999.
[24] Real World Computing Paternship, "Cd-sound scene database in real acoustical environments,", 2000.
[25] Leonardo Software, Santa Monica, USA,
[26] L. R. Rabiner, "A tutorial on hidden markov models and selected applications in speech recognition," Proc. of IEEE, vol. 77, no. 2, pp. 257-289, Feb. 1989.
[27] P. Mermelstein and S. B. Davis, "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences," in ICASSP, vol. 28, 1980, pp. 357-366.
[28] J. Makhoul, "Linear prediction: A tutorial review," in Proceedings of IEEE, vol. 63, 1975, pp. 561-580.
[29] P. Mermelstein and N. Morgan, "Rasta processing of speech," IEEE Transactions on Speech and Audio Processing, vol. 2, pp. 578-589, 1994.
[30] M. Vetterli and J. Kovacevic, Wavelets and subband coding. Englewood Cliffs, NJ, USA: Prentice Hall, 1995.
[31] S. Mallat, A wavelet tour of signal processing. Academic Press, 1998.
[32] P. Flandrin, Time-frequency/time Scale Analysis. San Diego, USA: Academic Press, 1999.
[33] I. Jollife, Principal Component Analysis. New York, USA: Springer- Verlag, 1986.
[34] J. Loehlin, Latent variable models: An Introduction to Factor, Path, and Structural Analysis. Lawrence Erlbaum Assoc., 2001.
[35] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. New-York, USA: Springer, 2001.
[36] A. Rabaoui, Z. Lachiri, and N. Ellouze, "Hidden Markov model environment adaptation for noisy sounds in a supervised recognition system," in International Symposium on Communication, Control and Signal Processing (ISCCSP), Marrakech, Morroco, Mar. 2006.
[37] K. Lee and H. Hon, "Large-vocabulary speaker-independent continuous speech recognition," in ICASSP, Apr. 1988.
[38] A. Acero, "Acoustical and Environmental Robustness in Automatic Speech Recognition," Ph.D. dissertation, Department of Electrical and Computer Engineering, Carnegie Mellon University, 1990.
[39] C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density HMMs," Computer Speech and Language, vol. 9, pp. 171-186, 1995.
[40] M. J. F. Gales and P. C. Woodland, "Variance compensation within the mllr framework," Technical Report CUED, Cambridge University, Tech. Rep., 1996.
[41] J. Bilmes, "A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models," International Computer Science Institute, Berkeley, USA, Tech. Rep., 1998.
[42] K. Shinoda and C.-H.Lee, "Unsupervised adaptation using structural bayes approach," in ICASSP, 1998.
[43] L. R. Rabiner, M. J. Cheng, A. E. Rosenberg, and C. A. McGonegal, "A comparative performance study of several pitch detection algorithms," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 24, no. 5, pp. 399-418, 1976.
[44] D. Mitrovic, "Discrimination and Retrieval of Environmental sounds," Ph.D. dissertation, Vienna University of Technology, Dec. 2005.