On-line Speech Enhancement by Time-Frequency Masking under Prior Knowledge of Source Location
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33122
On-line Speech Enhancement by Time-Frequency Masking under Prior Knowledge of Source Location

Authors: Min Ah Kang, Sangbae Jeong, Minsoo Hahn

Abstract:

This paper presents the source extraction system which can extract only target signals with constraints on source localization in on-line systems. The proposed system is a kind of methods for enhancing a target signal and suppressing other interference signals. But, the performance of proposed system is superior to any other methods and the extraction of target source is comparatively complete. The method has a beamforming concept and uses an improved time-frequency (TF) mask-based BSS algorithm to separate a target signal from multiple noise sources. The target sources are assumed to be in front and test data was recorded in a reverberant room. The experimental results of the proposed method was evaluated by the PESQ score of real-recording sentences and showed a noticeable speech enhancement.

Keywords: Beam forming, Non-stationary noise reduction, Source separation, TF mask.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1327682

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2024

References:


[1] M. Brandstein and D. Ward, Microphone Arrays, Springer, 2001.
[2] S. Haykin, Adaptive Filter Theory, Prentice Hall, 1991.
[3] S. Gannot, D. Burshtein, and E. Weinstein, "Signal enhancement using beamforming and nonstationarity with applications to speech," IEEE Trans. Signal Process., vol.49, no.8, Aug. 2001, pp.1614-1626.
[4] Ö. Yilmaz and S. Rickard, "Blind separation of speech mixtures via time-frequency masking," IEEE Trans. Signal Process., vol. 52, no. 7, July 2004, pp.1830-1846.
[5] ITU-T, "Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs," ITU-T Recommendation P.862, February 2001.
[6] H. Sawada, S. Araki, R. Mukai, and S. Makino, "Blind extraction of dominant target sources using ICA and time-frequency masking," IEEE Trans. Signal Process. , vol. 14, no. 6, Nov. 2006, pp.2165-2173.
[7] H. Saruwatari, S. Kurita, and K. Takeda, "Blind source separation combining frequency-domain ICA and beamforming," in Proc. ICASSP2001, pp.2733-2736.
[8] G. Shi and P. Aarabi, "Robust digit recognition using phase-dependent time-frequency masking," in Proceedings of ICASSP, Hong Kong, Apr. 2003, pp.684-687.
[9] A. Bell and T. Sejnowski, "An information maximization approach to blind separation and blind deconvolution," Neural Comput., vol.7, Nov. 1995, pp.1129-1159.
[10] J. Yang-Won, K. Hong-Goo, L. Chungyong, Y. Dae-Hee, C. Changkyu, and K. Jaywoo, "Adaptive Microphone Array System with Two-Stage Adaptation Mode Controller," in IEICE Trans. Fundamentals, vol. E88-A, no. 4, Apr. 2005.