Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30054
A Hybrid GMM/SVM System for Text Independent Speaker Identification

Authors: Rafik Djemili, Mouldi Bedda, Hocine Bourouba

Abstract:

This paper proposes a novel approach that combines statistical models and support vector machines. A hybrid scheme which appropriately incorporates the advantages of both the generative and discriminant model paradigms is described and evaluated. Support vector machines (SVMs) are trained to divide the whole speakers' space into small subsets of speakers within a hierarchical tree structure. During testing a speech token is assigned to its corresponding group and evaluation using gaussian mixture models (GMMs) is then processed. Experimental results show that the proposed method can significantly improve the performance of text independent speaker identification task. We report improvements of up to 50% reduction in identification error rate compared to the baseline statistical model.

Keywords: Speaker identification, Gaussian mixture model (GMM), support vector machine (SVM), hybrid GMM/SVM.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1079432

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF

References:


[1] D. Reynolds, "An overview of automatic speaker recognition technology," in Proc. Int. Conf. on Acoust. Speech and Signal Process. (ICASSP 2002), Orlando, FL, 2002, pp. 4072-4075.
[2] F. Bimbot, J.-F. Bonastre, G. Gravier, I. Chagnolleau, S. Meignier, T. Merlin, J. Garcia, D. Delacrètaz, and D. Reynolds, "A tutorial on text independent speaker verification," Eurasip Journal on Applied Signal Process., vol. 4, pp. 430-451, 2004.
[3] T. Matsui and S. Furui, "Comparison of text independent speaker recognition methods using VQ distorsion and discrete/continuous HMM-s," IEEE Trans. Speech and Audio Process., vol. 2, no. 3, pp. 456-459, July 2004.
[4] K. Farell, R. Mammone, and K. Assaleh, "Speaker recognition using neural networks and conventional classifiers," IEEE Trans. Speech and Audio Process., vol. 2, no. 1, pp. 194-205, 1994.
[5] J. Campbell, "Speaker recognition: A tutorial," Proc. IEEE, vol. 85, no. 9, pp. 1437-1462, Sep. 1997.
[6] D. Reynolds and R. Rose, "Robust text independent speaker identification using gaussian mixture models," IEEE Trans. Speech and Audio Process., vol. 3, no. 1, pp. 72-83, Jan. 1995.
[7] B. Boser, I. Guyon, and V. Vapnik, "A training algorithm for optimal margin classifiers," in Proc. of the 5th Annual ACM Workshop on Computational learning theory, ACM press, pp. 144-152, 1992.
[8] V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, 1995.
[9] J. Kharroubi, D. Petrovska, and G. Chollet, "Combining GMM-s with support vector machines classifier," in Proc. European Conf. Speech Communication and Technology (EUROSPEECH 2001), Aalborg, Denmark, 2001, pp. 1757-1760.
[10] S. Fine, J. Navratil, and R. Gopinath, "A hybrid GMM/SVM approach to speaker identification," in Proc. Int. Conf. on Acoust. Speech and Signal Process. (ICASSP 2001), Salt Lake City, Utah, 2001, pp. 417-420.
[11] X. Dong, W. Zhaohui, and Y. Yingchun, "Exploiting support vector machines in hidden Markov models for speaker verification," in Proc 7th Int. Conf. on Spoken Language Process. (ICSLP 2002), Denver, Colorado, 2002, pp. 1329-1332.
[12] R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed., Wiley, New York, 2001.
[13] A. Dempster, N. Laird, and D. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," J. Royal Stat. Soc., vol. 39, pp. 1-38, 1977.
[14] C. Cortes and V. Vapnik, "Support vector networks," Machine Learning, vol. 20, pp. 273-297, 1995.
[15] V. Keckman, Learning and Soft Computing, MIT Press, Cambridge, MA, 2001.
[16] V. Vapnik, Statistical Learning Theory, John Wiley, New York, 1998.
[17] C.J. C. Burges, "A tutorial on support vector machines for pattern recognition," Data Mining and Knowl. Discov., vol. 2, no. 2, pp. 1-47, 1998.
[18] R. Courant and D. Hilbert, Methods of Mathematical Physics, Wiley Interscience, New York, 1953.
[19] G. Doddington, M. Przybocki, A. Martin, and D. Reynolds, "The NIST speaker recognition evaluation: Overview, methodology, systems, results, perspectives," Speech Communication, vol. 31, pp. 225-254, 2000.
[20] M. Schmidt and H. Gish, "Speaker identification via support vector machines," in Proc. Int. Conf. on Acoust. Speech and Signal Process. (ICASSP 96), Atlanta, 1996, pp. 105-108.
[21] V. Wan and S. Renals, "Speaker verification using sequence discriminant support vector machines," IEEE Trans. Speech Audio Process., vol. 13, no. 2, pp. 203-210, Mar. 2005.
[22] X. Dong and W. Zhaohui, "Speaker recognition using continuous density support vector machines," Electronics Letters, vol. 37, no. 17, pp. 1099-1101, 2001.
[23] V. Wan and S. Renals, "SVMSVM: Support vector machine speaker verification methodology," in Proc. Int. Conf on Acoust. Speech and Signal Process. (ICASSP 2003), Hong Kong, 2003, vol. 2, pp. 221-224.
[24] P. Moreno and P. Ho, "A new approach to speaker identification and verification using probabilistic distance kernels," in Proc. European Conf. Speech Communication and Technology (EUROSPEECH 2003), Geneva, Switzerland, 2003, pp. 2965-2968.
[25] J. R. Della, J. H. L. Hansen, and J. G. Proakis, Discrete-Time Processing of Speech Signals, 2nd ed., IEEE Press, New York, 2000.
[26] D. O-Shaughnessy, "Interacting with computers by voice: Automatic speech recognition and synthesis," Proc. IEEE, vol. 91, no. 9, Nov. 2003.
[27] R. Stapert and J. S. Mason, "Speaker recognition and the acoustic speech space", in Proc. Speaker Odyssey: The Speaker recognition Workshop, Crete, Greece, 2001, pp. 195-1999.