An Intelligent Text Independent Speaker Identification Using VQ-GMM Model Based Multiple Classifier System
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33122
An Intelligent Text Independent Speaker Identification Using VQ-GMM Model Based Multiple Classifier System

Authors: Cheima Ben Soltane, Ittansa Yonas Kelbesa

Abstract:

Speaker Identification (SI) is the task of establishing identity of an individual based on his/her voice characteristics. The SI task is typically achieved by two-stage signal processing: training and testing. The training process calculates speaker specific feature parameters from the speech and generates speaker models accordingly. In the testing phase, speech samples from unknown speakers are compared with the models and classified. Even though performance of speaker identification systems has improved due to recent advances in speech processing techniques, there is still need of improvement. In this paper, a Closed-Set Tex-Independent Speaker Identification System (CISI) based on a Multiple Classifier System (MCS) is proposed, using Mel Frequency Cepstrum Coefficient (MFCC) as feature extraction and suitable combination of vector quantization (VQ) and Gaussian Mixture Model (GMM) together with Expectation Maximization algorithm (EM) for speaker modeling. The use of Voice Activity Detector (VAD) with a hybrid approach based on Short Time Energy (STE) and Statistical Modeling of Background Noise in the pre-processing step of the feature extraction yields a better and more robust automatic speaker identification system. Also investigation of Linde-Buzo-Gray (LBG) clustering algorithm for initialization of GMM, for estimating the underlying parameters, in the EM step improved the convergence rate and systems performance. It also uses relative index as confidence measures in case of contradiction in identification process by GMM and VQ as well. Simulation results carried out on voxforge.org speech database using MATLAB highlight the efficacy of the proposed method compared to earlier work.

Keywords: Feature Extraction, Speaker Modeling, Feature Matching, Mel Frequency Cepstrum Coefficient (MFCC), Gaussian mixture model (GMM), Vector Quantization (VQ), Linde-Buzo-Gray (LBG), Expectation Maximization (EM), pre-processing, Voice Activity Detection (VAD), Short Time Energy (STE), Background Noise Statistical Modeling, Closed-Set Tex-Independent Speaker Identification System (CISI).

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1108442

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1894

References:


[1] Furui S, “Recent advances in speaker recognition”,Pattern Recognition Letters, vol. 18, no. 9, (1997). September, pp. 859–872.H. Simpson, Dumb Robots, 3rd ed., Springfield: UOS Press, 2004, pp.6-9.
[2] K. Chen, L. Wang, and H. Chi., “Methods of combining multiple classifiers with different features and their applications to textindependent speaker identification”. Journal on Pattern Recognition and Artificial Intelligence, 11(3):417–445, 1997.
[3] Reynolds, D.A., “An overview of automatic speaker recognition technology”. Proc. IEEE Acoustics Speech Signal Processing 4,4072– 4075 (2002).
[4] Godino-Llorente, J.I., Gómez-Vilda, P., Sáenz Lechón, N., Velasco, M.B., Cruz Roldán, F., Ballester, M.A.F., “Discriminative Methods for the Detection of Voice Disorder”. In: A ISCA Tutorial and Research Workshop on Non-Linear Speech Processing, The COST- 277 Workshop (2005).
[5] Xugang, L., Jianwu, D., “An investigation of Dependencies between Frequency components and speaker characteristics for text–independent speaker identification”. Speech Communication 2007 50(4), 312– 322 (2007).
[6] D. A. Reynolds and R. C. Rose, “Robust text independent speaker identification using Gaussian mixture speaker models”. IEEE Trans. on Speech and audio processing, vol. 3(1), pp. 72–83, 1995.
[7] Yuk,C.C.Q.L.D.-S., “An HMM approach to text independent speaker verification”,. In IEEE international conference on Acoustics, Speech and signal processing, 1996.
[8] F. K. Soong, et. al., “A vector quantization approach to speaker recognition”, AT & T Technical Journal, Vol.66, No.2, pp. 14-26, 1987.
[9] T. Kinnunen, T., Kilpeläinen,T., Fränti P.: ”Comparison of clustering algorithms in speaker identification”, proc. Lasted Int. Conf. Signal Processing and Communications (SPC): 222- 227, Marbella, Spain, 2000.
[10] Y. Linde, A. Buzo and R. M. Gray, “An Algorithm for Vector Quantizer Design,”IEEE Transactions on Communications, vol. COM-28, pp. 84- 95, January 1980.
[11] Atal, B.; Rabiner, L., “A pattern recognition approach to voicedunvoiced- silence classification with applications to speech recognition”, Acoustics,Speech, and Signal Processing (see also IEEE Transactions on Signal Processing), IEEE Transactions on, Volume: 24 , Issue: 3 , Jun 1976, Pages: 201 - 212.
[12] D. G. Childers, M. Hand, J. M. Larar, “ Silent and Voiced/Unvoied/ Mixed Excitation(Four-Way), Classification of Speech”, IEEE Transaction on ASSP, Vol-37, No-11, pp. 1771-74, Nov 1989.
[13] G. Saha, Sandipan Chakroborty, Suman Senapat , "A New Silence Removal and end point detection algorithm for speech and Speaker Recognition Applications", Proceedings of the NCC 2005, Jan.
[14] A. Dempster, N. Laird, and D. Rubin, “Maximum Likelihood from incomplete data via the EM algorithm, ” J.Royal Stat. Soc., vol 39, pp. 1-38, 1977.