Text-independent Speaker Identification Based on MAP Channel Compensation and Pitch-dependent Features

Jiqing Han; Rongchun Gao

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33126

Text-independent Speaker Identification Based on MAP Channel Compensation and Pitch-dependent Features

Authors: Jiqing Han, Rongchun Gao

Abstract:

One major source of performance decline in speaker recognition system is channel mismatch between training and testing. This paper focuses on improving channel robustness of speaker recognition system in two aspects of channel compensation technique and channel robust features. The system is text-independent speaker identification system based on two-stage recognition. In the aspect of channel compensation technique, this paper applies MAP (Maximum A Posterior Probability) channel compensation technique, which was used in speech recognition, to speaker recognition system. In the aspect of channel robust features, this paper introduces pitch-dependent features and pitch-dependent speaker model for the second stage recognition. Based on the first stage recognition to testing speech using GMM (Gaussian Mixture Model), the system uses GMM scores to decide if it needs to be recognized again. If it needs to, the system selects a few speakers from all of the speakers who participate in the first stage recognition for the second stage recognition. For each selected speaker, the system obtains 3 pitch-dependent results from his pitch-dependent speaker model, and then uses ANN (Artificial Neural Network) to unite the 3 pitch-dependent results and 1 GMM score for getting a fused result. The system makes the second stage recognition based on these fused results. The experiments show that the correct rate of two-stage recognition system based on MAP channel compensation technique and pitch-dependent features is 41.7% better than the baseline system for closed-set test.

Keywords: Channel Compensation, Channel Robustness, MAP, Speaker Identification

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1056178

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1550

References:

[1] D. A. Reynolds, "Channel Robust Speaker Verification via Feature Mapping," in Proc. of ICASSP-03, Hong Kong, 2003,pp.53-56.
[2] B. S. Atal, "Effectiveness of Linear Prediction Characteristics of the Speech Wave for Automatic Speaker Identification and Verification," Journal of the Acoustical Society of America. Vol. 55, no.6, pp.1304-1312, 1974.
[3] H. Hermansky, N. Morgan, "RASTA Processing of Speech," IEEE Speech And Audio Processing, Vol.2, no.4, pp.578-589, 1994.
[4] S. Furui, "Cepstral Analysis Technique for Automatic Speaker Verification," IEEE, ASSP, Vol.29, no.2, pp.254-72, 1981.
[5] J. Chien, H. Wang, L. Lee, "Estimation of Channel Bias for Telephone Speech Recognition," in Proc. of ICSLP, 1996, pp.1840-1843.
[6] Teunen R, Shahshahani B, Heck L, "A Model-based Transformational Approach to Robust Speaker Recognition," in Proc. of ICSLP, 2000, pp.495-498.
[7] D. A. Reynolds, "The Effect of Handset Variability on Speaker Recognition Performance: Experiments on the Switchboard Corpus," in Proc. of ICASSP, 1996, pp.113-116.
[8] R. Auckenthaler, M. Carey, H. Lloyd-Thomas, "Score Normalization for Text-independent Speaker Verification System," Digital Signal Processing, vol.10, no.1, 2000.
[9] D. A. Reynolds, W. Andrews, J. Campbell, J. Navratil, B. Peskin, A. Adami, Q. Jin, D. Klusacek, J. Abramson, R. Mihaescu, J. Godfrey, D. Jones, B. Xiang, "The SuperSID Project: Exploiting High-level Information for High-accuracy Speaker Recognition," in Proc. of ICASSP-03, Hong Kong, 2003, pp. 784-787.
[10] K. Sönmez, E. Shriberg, L. Heck, M. Weintraub, "Modeling Dynamic Prosodic Variation for Speaker Verification," in Proc. of ICSLP, 1998, pp.3189-3192.
[11] M. J. Carey, E. S. Parris, H. Lloyd-Thomas, S. Bennett, "Robust Prosodic Features for Speaker Identification," in Proc. of ICSLP, 1996, pp.1800-1803.
[12] M. K. Sönmez, L. Heck, M. Weintraub, E. Shriberg, "A Lognormal Tied Mixture Model of Pitch for Prosodybased Speaker Recognition," in Proc. of Eurospeech, 1997, pp.1391-1394.