Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33156
Speech Intelligibility Improvement Using Variable Level Decomposition DWT
Authors: Samba Raju, Chiluveru, Manoj Tripathy
Abstract:
Intelligibility is an essential characteristic of a speech signal, which is used to help in the understanding of information in speech signal. Background noise in the environment can deteriorate the intelligibility of a recorded speech. In this paper, we presented a simple variance subtracted - variable level discrete wavelet transform, which improve the intelligibility of speech. The proposed algorithm does not require an explicit estimation of noise, i.e., prior knowledge of the noise; hence, it is easy to implement, and it reduces the computational burden. The proposed algorithm decides a separate decomposition level for each frame based on signal dominant and dominant noise criteria. The performance of the proposed algorithm is evaluated with speech intelligibility measure (STOI), and results obtained are compared with Universal Discrete Wavelet Transform (DWT) thresholding and Minimum Mean Square Error (MMSE) methods. The experimental results revealed that the proposed scheme outperformed competing methodsKeywords: Discrete Wavelet Transform, speech intelligibility, STOI, standard deviation.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.3669228
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 703References:
[1] P. C. Loizou, Speech Enhancement: Theory and Practice. Boca Raton, FL, USA: CRC press, 2007.
[2] Y. Ephraim and D. Malah, “Speech enhancement using a Minimum-Mean Square Error Short-Time Spectral Amplitude estimator,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 32, no. 6, pp. 1109–1121, 1984.
[3] S. G. Mallat, “A Theory for Multiresolution Signal Decomposition: The Wavelet Representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 674–693, 1989.
[4] G. Kim and P. C. Loizou, “Improving Speech Intelligibility in Noise using Environment-Optimized Algorithms,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 8, pp. 2080–2090, 2010.
[5] P. C. Loizou and G. Kim, “Reasons Why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 1, pp. 47–56, 2010.
[6] D. Wang and J. Chen, “Supervised peech separation based on deep learning: An overview,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 26, no. 10, pp. 1702–1726, 2018.
[7] M. Kolbk, Z.-H. Tan, J. Jensen, M. Kolbk, Z.-H. Tan, and J. Jensen, “Speech Intelligibility Potential of General and Specialized Deep Neural Network based Speech Enhancement Systems,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 25, no. 1, pp. 153–167, 2017.
[8] S. Y. Low, D. S. Pham, and S. Venkatesh, “Compressive Speech Enhancement,” Speech Communication, vol. 55, no. 6, pp. 757–768, 2013.
[9] M. Srivastava, C. L. Anderson, and J. H. Freed, “A New Wavelet Denoising Method for Selecting Decomposition Levels and Noise Thresholds,” IEEE Access, vol. 4, pp. 3862–3877, 2016.
[10] J. S. Garofolo et al., “Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database,” National Institute of Standards and Technology (NIST), Gaithersburgh, MD, vol. 107, pp. 1–6, 1988.
[11] A. Varga and H. J. Steeneken, “Assessment for Automatic Speech Recognition: II. NOISEX-92: A Database and an Experiment to Study the Effect of Additive Noise on Speech Recognition Systems,” Speech communication, vol. 12, no. 3, pp. 247–251, 1993.
[12] D. L. Donoho and J. M. Johnstone, “Ideal Spatial Adaptation by Wavelet Shrinkage,” biometrika, vol. 81, no. 3, pp. 425–455, 1994.
[13] D. L. Donoho, “De-noising by soft-thresholding,” IEEE Transactions on information Theory, vol. 41, no. 3, pp. 613–627, 1995.
[14] C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, “An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125–2136, 2011.