Hybrid Modeling Algorithm for Continuous Tamil Speech Recognition
Authors: M. Kalamani, S. Valarmathy, M. Krishnamoorthi
Abstract:
In this paper, Fuzzy C-Means clustering with Expectation Maximization-Gaussian Mixture Model based hybrid modeling algorithm is proposed for Continuous Tamil Speech Recognition. The speech sentences from various speakers are used for training and testing phase and objective measures are between the proposed and existing Continuous Speech Recognition algorithms. From the simulated results, it is observed that the proposed algorithm improves the recognition accuracy and F-measure up to 3% as compared to that of the existing algorithms for the speech signal from various speakers. In addition, it reduces the Word Error Rate, Error Rate and Error up to 4% as compared to that of the existing algorithms. In all aspects, the proposed hybrid modeling for Tamil speech recognition provides the significant improvements for speechto- text conversion in various applications.
Keywords: Speech Segmentation, Feature Extraction, Clustering, HMM, EM-GMM, CSR.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1337827
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2139References:
[1] T.B.Adam, M.Salam, “Spoken English Alphabet Recognition with Mel Frequency Cepstral Coefficients and Back Propagation Neural Networks”, International Journal of Computer Applications, vol. 42, no.12, pp. 21-27, March 2012.
[2] M.A.Al-Alaoui, L.Al-Kanj, J.Azar and E.Yaacoub, “Speech Recognition using Artificial Neural Networks and Hidden Markov Models”, IEEE Multidisciplinary Engineering Education Magazine, vol. 3, no. 3, pp. 77-86, September 2008.
[3] J. C.Bezdek , Robert Ehrlich and William Full, “FCM: The Fuzzy cmeans clustering algorithm”, Computers & Geosciences, vol. 10, no. 2- 3, pp. 191-203, 1984.
[4] S.Chattopadhyay, “A Comparative study of Fuzzy C-Means Algorithm and Entropy-based Fuzzy Clustering Algorithms”, Computing and Informatics, vol. 30, pp.701–720, 2011.
[5] D.Chazan, R.Hoory, G.Cohen and M.Zibulski, “Speech reconstruction from mel frequency cepstral coefficients and pitch frequency,” Proc. ICASSP, vol. 3, pp. 1299–1302, 2000.
[6] S.Davis and P.Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 28, no. 4, pp. 357-366, 1980.
[7] J.R.Deller, J.H.L.Hansen and J.G.Proakis, Discrete-Time Processing of Speech Signals, IEEE Press, New York, 2000.
[8] S.Furui, “Speaker-independent isolated word recognition using dynamic features of speech spectrum,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 34, no. 1, pp. 52-59, 1986.
[9] G.Hemakumar and P.Punitha, “ Automatic Segmentation of Kannada Speech Signal into Syllables and Sub-words: Noised and Noiseless Signals”, International Journal of Scientific & Engineering Research, vol.5, no.1, pp. 1707- 1711, January 2014.
[10] H.Hermansky and N.Morgan,“RASTA processing of speech”, IEEE Transactions on Speech and Audio Processing, vol. 2, no.4, pp. 578– 589, 1994.
[11] H.Hermansky, “Perceptual linear predictive (PLP) analysis for speech”, Journal of Acoustic Society of America, pp. 1738–1752,1990.
[12] T.Kinnunen, T.Kilpeläinen and P.Fränti “Comparison of Clustering Algorithms in Speaker Identification,” Proceedings of International Conference on Signal Processing and Communications (SPC 2000), Spain, pp. 222-227, September 2000.
[13] R.S.Kurcan, “Isolated word recognition from in-ear microphone data using Hidden Markov Models (HMM)”, Ph.D. Thesis, March 2006.
[14] Y.Linde, A.Buzo and R.M.Gray, “An algorithm for vector quantizer design,” IEEE Transactions on Communications, vol. 28, pp. 84-95, 1980.
[15] B.Milner and X.Shao, “Prediction of fundamental frequency and voicing from mel-frequency cepstral coefficients for unconstrained speech reconstruction,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 1, pp. 24–33, January 2007.
[16] S.Moon and J.Hwang, “Robust speech recognition based on joint model and feature space optimization of hidden Markov models,” IEEE Trans. Neural Networks, vol. 8, pp. 194–204, March 1997.
[17] L.R.Rabiner and B-H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993.
[18] L.R.Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol.77, no.2, pp. 257- 286, February 1989.
[19] L.R.Rabiner and M.R.Sambur, “An algorithm for determining the endpoints of isolated utterances,” The Bell System Technical Journal, February 1975.
[20] M.M.Rahman and M.A.Bhuiyan, “Continuous Bangla Speech Segmentation using Short-term Speech Features Extraction Approaches”, International Journal of Advanced Computer Science and Applications, vol. 3, no. 11, pp. 131-138, 2012.
[21] D.A.Reynolds and R.C.Rose, “Robust Text-Independent Speaker Identification using Gaussian Mixture Speaker Models”, IEEE Transactions on Speech and Audio Processing, vol.3, no.1, pp. 72-83, January 1995.
[22] X.Shao and B.Milner, “Clean speech reconstruction from noisy melfrequency cepstral coefficients using a sinusoidal model,” in Proc. ICASSP, 2003, vol. 1, pp. 704–707.
[23] R.Thangarajan , A.M.Natarajan and M.Selvam, “Syllable modeling in continuous speech recognition for Tamil language”, International Journal of Speech Technology, vol.2, pp. 47–57, 2009.
[24] M.Vyas, “A Gaussian Mixture Model based Speech Recognition system using MATLAB”, Signal & Image Processing: An International Journal (SIPIJ), vol.4, no.4, pp.109-118, August 2013.
[25] G.S.Ying, C.D.Mitchell and L.H.Jamieson, “Endpoint detection of isolated utterances based on a modified Teager energy measurement,” Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-93), vol. 2, pp. 732-735, April 1993.
[26] Q.Zhu and A.Alwan, “Non-linear feature extraction for robust speech recognition in stationary and non-stationary noise”, Speech Communication, March 2003.