Commenced in January 2007
Paper Count: 30184
Using HMM-based Classifier Adapted to Background Noises with Improved Sounds Features for Audio Surveillance Application
Abstract:Discrimination between different classes of environmental sounds is the goal of our work. The use of a sound recognition system can offer concrete potentialities for surveillance and security applications. The first paper contribution to this research field is represented by a thorough investigation of the applicability of state-of-the-art audio features in the domain of environmental sound recognition. Additionally, a set of novel features obtained by combining the basic parameters is introduced. The quality of the features investigated is evaluated by a HMM-based classifier to which a great interest was done. In fact, we propose to use a Multi-Style training system based on HMMs: one recognizer is trained on a database including different levels of background noises and is used as a universal recognizer for every environment. In order to enhance the system robustness by reducing the environmental variability, we explore different adaptation algorithms including Maximum Likelihood Linear Regression (MLLR), Maximum A Posteriori (MAP) and the MAP/MLLR algorithm that combines MAP and MLLR. Experimental evaluation shows that a rather good recognition rate can be reached, even under important noise degradation conditions when the system is fed by the convenient set of features.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1078749Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1285
 C. Couvreur, "Environmental Sound Recognition: A Statistical Approach," Ph.D. dissertation, Faculte Polytechnique de Mons, Belgium, June 1997.
 V. Peltonen, "Computational auditory scene recognition," Ph.D. dissertation, Tampere University of Technology, Finland, 2001.
 D. Istrate, "D'etection et reconnaissance des sons pour la surveillance m'edicale," Ph.D. dissertation, INPG, France, Dec. 2003.
 K. El-Maleh, "Frame level noise classification in mobile environments," Ph.D. dissertation, McGill University, Montreal, Canada, Jan. 2004.
 R. S. Goldhor, "Recognition of environmental sounds," in ICASSP, vol. 1, New York, USA, 1993, pp. 149-152.
 B. Uvacek, H. Ye, and G. Moschytz, "A new strategy for tactile hearing aids: tactile identification of preclassified signals (tips)," in International Conference on Acoustic, Speech and Signal Processing (ICASSP), New- York, USA, May 1988.
 A. K. S. Oberle, "Recognition of acoustical alarm signals for the profoundly deaf using hidden markov models," in International Symposium on Circuits and Systems, vol. 1, Seattle, USA, 1995, pp. 2285-2288.
 J. A. Osuna and G. S. Moschytz, "Recognition of acoustical alarm signals with cellular networks," in European Conference on Circuit Theory and Design, Istanbul, Turkey, 1995.
 M. J. Paradie and S. Nawab, "Classification of ringing sounds," in ICASSP, Apr. 1990.
 R. H. Cabell, C. Fuller, and W. O-Brien, "Identification of Helicopter noise Using a Neural Network," AIAA Journal, vol. 30, no. 3, pp. 624- 630, Mar. 1992.
 A. Eronen and A. Klapuri, "Musical instrument recognition using cepstral coefficients and temporal features," in ICASSP, Istanbul, Turkey, 2000, pp. 753-756.
 H. Soltau, T. Schultz, and M. Westphal, "Recognition of music types," in ICASSP, Seattle, WA, 1998.
 A. Dufaux, "Detection and recognition of Impulsive Sounds Signals," Ph.D. dissertation, Facult'e des sciences de l-Universit'e de Neuch╦åatel, Switzerland, 2001.
 A. Bregman, Auditory scene analysis. Cambridge, USA: MIT Press, 1990.
 K. D. Martin, "Sound-source recognition: A theory and computational model," Ph.D. dissertation, MIT Press, 1999.
 A. Klapuri and M. Davy, Eds., Signal Processing Methods for Music Transcription. New York: Springer, 2006.
 M. Orr, D. Pham, B. Lithgow, and R. Mahony, "Speech perception based algorithm for the separation of overlapping speech signal," in The Seventh Australian and New Zealand Intelligent Information Systems Conference, 2001.
 M. Cowling, "Non-speech environmental sound classification system for autonomous surveillance," Ph.D. dissertation, Faculty of Engineering and Information Technology, Griffith University, 2004.
 M. Cowling and R. Sitte, "Recognition of environmental sounds using speech recognition techniques," Advanced Signal Processing for Communications Systems, 2002.
 ÔÇöÔÇö, "Comparison of techniques for environmental sound recognition," Pattern Recognition Letters, vol. 24, pp. 2895-2907, 2003.
 Y. Gong, "Speech recognition in noisy environments: A survey," Speech Communication, vol. 16, pp. 261-291, 1995.
 C. H. Lee, "On stochastic feature and model compensation approaches to robust speech recognition," Speech Communication, vol. 25, pp. 29-47, 1998.
 ÔÇöÔÇö, "Adaptive classification and decision strategies for robust speech recognition," in Workshop on Robust Methods Speech Recognition Adverse Conditions, Tempere, Finland, May 1999.
 Real World Computing Paternship, "Cd-sound scene database in real acoustical environments," http://tosa.mri.co.jp/sounddb/indexe.htm, 2000.
 Leonardo Software, Santa Monica, USA, http://www.leonardosoft.com.
 L. R. Rabiner, "A tutorial on hidden markov models and selected applications in speech recognition," Proc. of IEEE, vol. 77, no. 2, pp. 257-289, Feb. 1989.
 P. Mermelstein and S. B. Davis, "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences," in ICASSP, vol. 28, 1980, pp. 357-366.
 J. Makhoul, "Linear prediction: A tutorial review," in Proceedings of IEEE, vol. 63, 1975, pp. 561-580.
 P. Mermelstein and N. Morgan, "Rasta processing of speech," IEEE Transactions on Speech and Audio Processing, vol. 2, pp. 578-589, 1994.
 M. Vetterli and J. Kovacevic, Wavelets and subband coding. Englewood Cliffs, NJ, USA: Prentice Hall, 1995.
 S. Mallat, A wavelet tour of signal processing. Academic Press, 1998.
 P. Flandrin, Time-frequency/time Scale Analysis. San Diego, USA: Academic Press, 1999.
 I. Jollife, Principal Component Analysis. New York, USA: Springer- Verlag, 1986.
 J. Loehlin, Latent variable models: An Introduction to Factor, Path, and Structural Analysis. Lawrence Erlbaum Assoc., 2001.
 T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. New-York, USA: Springer, 2001.
 A. Rabaoui, Z. Lachiri, and N. Ellouze, "Hidden Markov model environment adaptation for noisy sounds in a supervised recognition system," in International Symposium on Communication, Control and Signal Processing (ISCCSP), Marrakech, Morroco, Mar. 2006.
 K. Lee and H. Hon, "Large-vocabulary speaker-independent continuous speech recognition," in ICASSP, Apr. 1988.
 A. Acero, "Acoustical and Environmental Robustness in Automatic Speech Recognition," Ph.D. dissertation, Department of Electrical and Computer Engineering, Carnegie Mellon University, 1990.
 C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density HMMs," Computer Speech and Language, vol. 9, pp. 171-186, 1995.
 M. J. F. Gales and P. C. Woodland, "Variance compensation within the mllr framework," Technical Report CUED, Cambridge University, Tech. Rep., 1996.
 J. Bilmes, "A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models," International Computer Science Institute, Berkeley, USA, Tech. Rep., 1998.
 K. Shinoda and C.-H.Lee, "Unsupervised adaptation using structural bayes approach," in ICASSP, 1998.
 L. R. Rabiner, M. J. Cheng, A. E. Rosenberg, and C. A. McGonegal, "A comparative performance study of several pitch detection algorithms," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 24, no. 5, pp. 399-418, 1976.
 D. Mitrovic, "Discrimination and Retrieval of Environmental sounds," Ph.D. dissertation, Vienna University of Technology, Dec. 2005.