Improvement of MLLR Speaker Adaptation Using a Novel Method

Ing-Jr Ding

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32807

Improvement of MLLR Speaker Adaptation Using a Novel Method

Authors: Ing-Jr Ding

Abstract:

This paper presents a technical speaker adaptation method called WMLLR, which is based on maximum likelihood linear regression (MLLR). In MLLR, a linear regression-based transform which adapted the HMM mean vectors was calculated to maximize the likelihood of adaptation data. In this paper, the prior knowledge of the initial model is adequately incorporated into the adaptation. A series of speaker adaptation experiments are carried out at a 30 famous city names database to investigate the efficiency of the proposed method. Experimental results show that the WMLLR method outperforms the conventional MLLR method, especially when only few utterances from a new speaker are available for adaptation.

Keywords: hidden Markov model, maximum likelihood linearregression, speech recognition, speaker adaptation.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1328538

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1798

References:

[1] J. L. Gauvain, and C. H. Lee, "Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains," IEEE Transactions on Speech and Audio Processing, vol. 2, no. 2, pp. 291-298, 1994.
[2] G. Zavaliagkos, R. Schwartz, and J. Makhoul, "Batch, Incremental and Instantaneous Adaptation Techniques for Speech Recognition," in Proc. the International Conference on Acoustic, Speech and Signal Processing (ICASSP), 1995, pp. 676-679.
[3] C. H. Lee, C. H. Lin, and B. H. Juang, "A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 39, no. 4, pp. 806-814, 1991.
[4] S. J. Cox, and J. S. Bridle, "Unsupervised Speaker Adaptation by Probabilistic Spectrum Fitting," in Proc. the IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), 1989, pp. 294-297.
[5] C. J. Leggetter, and P. C. Woodland, "Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models," Computer Speech and Language, vol. 9, pp. 171-185, 1995.
[6] J. T. Chien, L. M. Lee, and H. C. Wang, "Estimation of Channel Bias for Telephone Speech Recognition," in Proc. the International Conference on Spoken Language Processing (ICSLP), 1996, pp. 1840-1843.
[7] J. T. Chien, and H. C. Wang, "Telephone Speech Recognition Based on Bayesian Adaptation of Hidden Markov Models," Speech Communication, vol. 22, pp. 369-384, 1997.
[8] C. Chesta, O. Siohan, and C. H. Lee, "Maximum A Posteriori Linear Regression for Hidden Markov Model Adaptation," in Proc. the European Conference on Speech Communication and Technology (EUROSPEECH), 1999, pp. 211-214.
[9] W. Chou, "Maximum A Posteriori Linear Regression with Elliptically Symmetric Matrix Priors," in Proc. the European Conference on Speech Communication and Technology (EUROSPEECH), 1999, pp. 1-4.
[10] W. Byrne, and A. Gunawardana, "Discounted Likelihood Linear Regression for Rapid Adaptation," in Proc. the European Conference on Speech Communication and Technology (EUROSPEECH), 1999, pp. 203-206.
[11] R. Kuhn, J. -C. Junqua, P. Nguyen, and N. Niedzielski, "Rapid Speaker Adaptation in Eigenvoice Space," IEEE Transactions on Speech and Audio Processing, vol. 8, no. 6, pp. 695-707, 2000.
[12] K. T. Chen, W. W. Liau, H. M. Wang, and L. S. Lee, "Fast Speaker Adaptation Using Eigenspace-based Maximum Likelihood Linear Regression," in Proc. the International Conference on Spoken Language Processing (ICSLP), 2000, pp. 742-745.
[13] K. T. Chen, and H. M. Wang, "Eigenspace-based Maximum A Posteriori Linear Regression for Rapid Speaker Adaptation," in Proc. the IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), 2001, pp. 917-920.
[14] A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum Likelihood from Incomplete Data via the EM Algorithm," Journal of the Royal Statistical Society, vol. 39, pp. 1-38, 1977.
[15] M. J. F. Gales, and P. C. Woodland, "Mean and Variance Adaptation Within the MLLR Framework," Computer and Speech Language, vol. 10, pp. 249-264, 1996.
[16] W. G. Cochran, "Problems Arising in the Analysis of A Series of Similar Experiments," Journal of the Royal Statistical Society, vol. 4 (Suppl.), pp. 102-118, 1937.
[17] H. C. Wang, "MAT - A Project to Collect Mandarin Speech Data through Telephone Networks in Taiwan," Comput. Linguist. Chinese Lang. Process., vol. 2, pp. 73-89, 1997.