Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30127
Speech Enhancement Using Wavelet Coefficients Masking with Local Binary Patterns

Authors: Christian Arcos, Marley Vellasco, Abraham Alcaim

Abstract:

In this paper, we present a wavelet coefficients masking based on Local Binary Patterns (WLBP) approach to enhance the temporal spectra of the wavelet coefficients for speech enhancement. This technique exploits the wavelet denoising scheme, which splits the degraded speech into pyramidal subband components and extracts frequency information without losing temporal information. Speech enhancement in each high-frequency subband is performed by binary labels through the local binary pattern masking that encodes the ratio between the original value of each coefficient and the values of the neighbour coefficients. This approach enhances the high-frequency spectra of the wavelet transform instead of eliminating them through a threshold. A comparative analysis is carried out with conventional speech enhancement algorithms, demonstrating that the proposed technique achieves significant improvements in terms of PESQ, an international recommendation of objective measure for estimating subjective speech quality. Informal listening tests also show that the proposed method in an acoustic context improves the quality of speech, avoiding the annoying musical noise present in other speech enhancement techniques. Experimental results obtained with a DNN based speech recognizer in noisy environments corroborate the superiority of the proposed scheme in the robust speech recognition scenario.

Keywords: Binary labels, local binary patterns, mask, wavelet coefficients, speech enhancement, speech recognition.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1314546

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 481

References:


[1] J. Benesty, S. Makino, J. Chen, Speech Enhancement, Springer, 2005.
[2] P. C. Loizou, Speech enhancement: theory and practice, CRC press, 2013.
[3] S. Boll, Suppression of acoustic noise in speech using spectral subtraction, IEEE Transactions on acoustics, speech, and signal processing, 27, pp. 113-120 1979.
[4] Y. Ephraim, D. Malah, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator IEEE Transactions on Acoustics, Speech, and Signal Processing, 32, pp. 1109-1121, 1984.
[5] D. L. Donoho, De-noising by soft-thresholding, IEEE transactions on information theory, 41, pp. 613-627, 1995.
[6] Y. Wang, K. Han, and D. Wang, Exploring monaural features for classification-based speech segregation, IEEE Transactions on Audio, Speech, and Language Processing, 21(2), pp. 270-279, 2013.
[7] D. Wang, G. J. BrownComputational auditory scene analysis: Principles, algorithms, and applications, Hoboken, NJ, USA Wiley-IEEE press, 2006.
[8] D. Wang,On ideal binary mask as the computational goal of auditory scene analysis, Speech separation by humans and machines, p. 181-197, 2005.
[9] Y. Jiang, H. Zhou, and Z. Feng Performance analysis of ideal binary masks in speech enhancement In 4th International Congress Image and Signal Processing (ICISP), Vol. 5, pp. 2422-2425, october. 2011.
[10] T. Ojala, M. Pietikinen, D. Harwood A comparative study of texture measures with classification based on featured distributions, Pattern recognition, pp. 51-9, 1996.
[11] N. Chatlani, JJ. Soraghan, Local binary patterns for 1-D signal processing, EUSIPCO, p. 95-99, 2010.
[12] D. Pearce, and H. G. Hirsch, The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In Sixth International Conference on Spoken Language Processing, 2000.
[13] D. L. Donoho, I. M. Johnstone,Threshold selection for wavelet shrinkage of noisy data, In Engineering in Medicine and Biology Society, Vol. 1, pp. A24-A25, nov. 1994.
[14] S. Liao, M. W. Law, A. C. Chung,Dominant local binary patterns for texture classification, IEEE transactions on image processing, 18(5), pp. 1107-1118, 2009.
[15] J. Chen, S. Shan, C. He, G. Zhao, M. Pietikainen, X. Chen, and W. Gao, WLD: A robust local image descriptor, IEEE transactions on pattern analysis and machine intelligence, 32(9), pp. 1705-1720. 2010.
[16] D. Gupta, and A. Jindal. Content based image retrieval using enhanced local tetra patterns International journal of innovative research in science and engineering, January 2017.
[17] I. Cohen, Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging, IEEE Transactions on Speech and Audio Processing, 11(5), pp. 466-475. 2003.
[18] J. G. Beerends, A. P. Hekstra, A. M. Rix,and M. P. Hollier, Perceptual evaluation of speech quality (pesq) the new itu standard for end-to-end speech quality assessment part ii: psychoacoustic model. Journal of the Audio Engineering Society, 50(10), pp. 765-778, 2002.
[19] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, and J. Silovsky. The Kaldi speech recognition toolkit, In IEEE workshop on automatic speech recognition and understanding, IEEE Signal Processing Society, Dec 2011.