Slice Bispectrogram Analysis-Based Classification of Environmental Sounds Using Convolutional Neural Network
Authors: Katsumi Hirata
Certain systems can function well only if they recognize the sound environment as humans do. In this research, we focus on sound classification by adopting a convolutional neural network and aim to develop a method that automatically classifies various environmental sounds. Although the neural network is a powerful technique, the performance depends on the type of input data. Therefore, we propose an approach via a slice bispectrogram, which is a third-order spectrogram and is a slice version of the amplitude for the short-time bispectrum. This paper explains the slice bispectrogram and discusses the effectiveness of the derived method by evaluating the experimental results using the ESC‑50 sound dataset. As a result, the proposed scheme gives high accuracy and stability. Furthermore, some relationship between the accuracy and non-Gaussianity of sound signals was confirmed.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.3593204Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 346
 S. Chu, S. Narayanan and C.-C. J. Kuo, “Environmental sound recognition with time–frequency audio features,” IEEE Transactions on Audio, Speech, and Language Processing, 17-6, pp.1142-1158, Aug. 2009.
 F. Su, L. Yang, T. Lu and G. Wang, “Environmental sound classification for scene recognition using local discriminant bases and HMM,” Proceedings of the 19th ACM international conference on Multimedia, pp.1389-1392, Nov. 2011.
 S. Chachada and C.-C. J. Kuo, “Environmental sound recognition: A survey,” 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, pp.1-9, Oct. 2013.
 K. J. Piczak, “Environmental sound classification with convolutional neural networks,” 2015 IEEE international workshop on machine learning for signal processing, Sept. 2015.
 M. Huzaifah, “Comparison of time-frequency representations for environmental sound classification using convolutional neural networks,” ArXiv Prepr. ArXiv170607156, 2017.
 “ESC-50: Dataset for Environmental Sound Classification”, https:// github.com/karoldvl/ESC-50 (Last accessed at Oct. 3, 2019).
 K. J. Piczak, "ESC: Dataset for Environmental Sound Classification," Proceedings of the 23rd Annual ACM Conference on Multimedia, pp.1015-1018, Oct. 2015.
 C. L. Nikias and A. P. Petropulu, Higher-order spectra analysis: a nonlinear signal processing framework, Prentice Hall, 1993, pp.7-30
 V. Swarnkar, U. Abeyratne, and C. Hukins, “Objective measure of sleepiness and sleep latency via bispectrum analysis of EEG,” Medical and & biological engineering & computing, 48, pp.1203-1213, Dec. 2010.
 K. Hirata, “Estimating 3D-Position of A Stationary Random Acoustic Source Using Bispectral Analysis of 4-Point Detected Signals,” International Journal of Computer and Information Engineering, 8-6, pp.932-935, 2014.