Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 31435
Automatic Recognition of an Unknown and Time-Varying Number of Simultaneous Environmental Sound Sources

Authors: S. Ntalampiras, I. Potamitis, N. Fakotakis, S. Kouzoupis


The present work faces the problem of automatic enumeration and recognition of an unknown and time-varying number of environmental sound sources while using a single microphone. The assumption that is made is that the sound recorded is a realization of sound sources belonging to a group of audio classes which is known a-priori. We describe two variations of the same principle which is to calculate the distance between the current unknown audio frame and all possible combinations of the classes that are assumed to span the soundscene. We concentrate on categorizing environmental sound sources, such as birds, insects etc. in the task of monitoring the biodiversity of a specific habitat.

Keywords: automatic recognition of multiple sound sources, enumeration of sound sources, computational ecology.

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1304


[1] O. Wang, D. and G. J. Brown, Computational Auditory Scene Analysis: Principles, Algorithms and Applications, Wiley-Blackwell, Oxford, UK, 2006.
[2] R. Radhakrishnan, and A. Divakaran, "Systematic acquisition of audio classes for elevator surveillance," in Image and Video Communications and Processing 2005, vol. 5685 of Proceedings of SPIE, pp. 64-71, March 2005.
[3] A.J. Eronen, V.T. Peltonen, J.T. Tuomi, A.P. Klapuri, S. Fagerlund, T. Sorsa, and G. Lorho, "Audio-Based Context Recognition", IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 1, pp. 321-329, Jan. 2006.
[4] J. Ogle, and D. Ellis, "Fingerprinting to identify repeated sound events in long-duration personal audio recordings," in International Conference on Acoustics, Speech and Signal Processing, Hawaii, pp. I-233-236, 2007.
[5] I. Potamitis, "Single channel enumeration and recognition of an unknown and time-varying number of sound sources", in 16th European Signal Processing Conference, Laussane, Switzerland, August 2008.
[6] L. Deng, J. Droppo, and A. Acero, "Estimating Cepstrum of Speech Under the Presence of Noise Using a Joint Prior of Static and Dynamic Features", IEEE Transactions on Speech & Audio Processing, vol. 12, no. 3, pp. 218-233, May 2004.
[7] M. Cowling, and R. Sitte, "Comparison of techniques for environmental sound recognition", Pattern Recognition Letters, vol. 24, no. 15, pp. 2895-2907, Nov. 2003.
[8] F. Sattar, M.Y. Siyal, L.C. Wee, and L.C. Yen, "Blind source separation of audio signals using improved ICA method", 11th IEEE Signal Processing Workshop on Statistical Signal Processing, Singapore, pp. 452-455, 2001.
[9] J. Herre, E. Allamanche, and O. Hellmuth, "Robust matching of audio signals using spectral flatness features," in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New York, pp. 127-130, Oct. 2001.
[10] P. Cano, E. Batlle, E. G├│mez, R. De C. T. Gomes, and M. Bonnet, "Audio Fingerprinting: Concepts and Applications", Book Chapter, Springer-Verlag, pp. 233-245, 2005.
[11] E. Allamanche, J. Herre, O. Hellmuth, B. Bernhard Fröbach, and M. Cremer, "AudioID: Towards Content-Based Identification of Audio Material", 100th AES Convention, Amsterdam, May 2001.