Forensic Speaker Verification in Noisy Environmental by Enhancing the Speech Signal Using ICA Approach
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32797
Forensic Speaker Verification in Noisy Environmental by Enhancing the Speech Signal Using ICA Approach

Authors: Ahmed Kamil Hasan Al-Ali, Bouchra Senadji, Ganesh Naik

Abstract:

We propose a system to real environmental noise and channel mismatch for forensic speaker verification systems. This method is based on suppressing various types of real environmental noise by using independent component analysis (ICA) algorithm. The enhanced speech signal is applied to mel frequency cepstral coefficients (MFCC) or MFCC feature warping to extract the essential characteristics of the speech signal. Channel effects are reduced using an intermediate vector (i-vector) and probabilistic linear discriminant analysis (PLDA) approach for classification. The proposed algorithm is evaluated by using an Australian forensic voice comparison database, combined with car, street and home noises from QUT-NOISE at a signal to noise ratio (SNR) ranging from -10 dB to 10 dB. Experimental results indicate that the MFCC feature warping-ICA achieves a reduction in equal error rate about (48.22%, 44.66%, and 50.07%) over using MFCC feature warping when the test speech signals are corrupted with random sessions of street, car, and home noises at -10 dB SNR.

Keywords: Noisy forensic speaker verification, ICA algorithm, MFCC, MFCC feature warping.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1130185

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 924

References:


[1] M. I. Mandasari, M. McLaren, and D. A. van Leeuwen, ”The effect of noise on modern automatic speaker recognition systems,” in IEEE Int. Conf. Acoust., Speech Signal Process., 2012, pp. 4249-4252.
[2] G. S. Morrison, P. Rose, and C. Zhang, ”Protocol for the collection of databases of recordings for forensic-voice-comparison research and practice,”Australian J. Forensic Sci., vol. 44, pp. 155-167, 2012.
[3] J. P. Campbell, W. Shen, W. M. Campbell, R. Schwartz, J. F. Bonastre, and D. Matrouf, ”Forensic speaker recognition,” IEEE Signal Process. Mag., pp. 95-103, 2009.
[4] Berouti , M., Schwartz, R. and Makhoul, J., “Enhancement of speech corrupted by acoustic noise”, IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 4, 1979, pp. 208-211.
[5] Donho, D.L and Johnston, I.M., “Ideal spatial adapation by wavelet shrinkage”, Biometrika J., vol. 81, pp. 425-455,1994.
[6] A. K. H. AL-ALI, D. Dean, B. Senadji, and V. Chandran,”Comparison of speech enhancement algorithms for forensic applications,”in 16th Speech science and technology conference, Sydney, 2016.
[7] H. Liang, J. Rosca, and R. Balan, ”Independent component analysis based single channel speech enhancement,” in 3rd IEEE Int. Symp. Signal Process. Inform. Technology, 2003, pp. 522-525.
[8] H. Li, H. Wang, and B. Xiao, ”Blind separation of noisy mixed speech signals based on wavelet transform and Independent Component Analysis,” in 8th Int. Conf. Signal Process., 2006.
[9] Hyvarinen, A. and Oja, E., “Independent component analysis: algorithms and applications”, Neural Netw., vol. 13, no. 4, pp. 411-430, 2000.
[10] H.-y. Li, Q.-h. Zhao, G.-l. Ren, and B.-j. Xiao, ”Speech Enhancement Algorithm Based on Independent Component Analysis,” in 5th Int. Conf. Natural Computation, 2009, pp. 598-602.
[11] D. B. Dean, S. Sridharan, R. J. Vogt, and M. W. Mason, ”The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms,” in Proc. Interspeech, Makuhari, Japan, 2010, pp. 26-30.
[12] R. S. Holambe and M. S. Deshpande, ”Noise Robust Speaker Identification: Using Nonlinear Modeling Techniques,” in Forensic Speaker Recognition, Ed: Springer, 2012, pp. 153-182.
[13] A. Varga and H. J. M. Steeneken,”Assessment for automatic speech recognition:II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems,” Speech Commun., vol. 12, no. 3, pp. 247-251, 1993.
[14] S. O. Sadjadi, M. Slaney, and L. Heck, ”MSR identity toolbox - A matlab toolbox for speaker recognition research”, Microsoft Research,Conversational Systems Research Center (CSRC), 2013.
[15] G. S. Morrison, C. Zhang, E. Enzinger, F. Ochoa, D. Bleach, M. Johnson, B. K. Folky, S. Desouza, N. Cumminus, D. Chow. (2015). Forensic database of voice recordings of 500+ Australian English speakers. (Available: http//databases.forensic-voice-comparison.net/).
[16] J. Sohn, N. S. Kim, and W. Sung, ”A statistical model-based voice activity detection,” IEEE Signal Pocess. Lett., vol. 6, no.1, pp. 1-3, Jan. 1999.